Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Oran Lang * Yossi Gandelsman * Michal Yarom * Yoav Wald * Gal Elidan Avinatan Hassidim William T. Freeman
Phillip Isola Amir Globerson Michal Irani Inbar Mosseri

  Google Research

* Equal contributors.

Classifier-specific interpretable attributes emerge in the StylEx StyleSpace. Our system, StylEx, explains the decisions of a classifier by discovering and visualizing multiple attributes that affect its prediction. (Left) StylEx achieves this by training a StyleGAN specifically to explain the classifier (e.g., a "cat vs. dog" classifier), thus driving latent attributes in the GAN's StyleSpace to capture classifier-specific attributes. (Right) We automatically discover top visual attributes in the StyleSpace coordinates, which best explain the classifier's decision. For each discovered attribute, StylEx can then provide an explanation by generating a counterfactual example, i.e., visualizing how manipulation of this attribute (style coordinate) affects the classifier output probability. The generated counterfactual examples are marked in the figure by the color of their attribute. The degree to which manipulating this attribute affects the classifier probability is shown in the top-left of each image. The top attributes found by our method indeed correspond to coherent semantic properties that affect perception of cats vs. dogs (e.g. open or closed mouth, eye shape, and pointed or dropped ears).



Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image. However, because standard GAN training is not dependent on the classifier, it may not represent these attributes which are important for the classifier decision, and many dimensions of StyleSpace may represent irrelevant attributes. To overcome this, we propose a training procedure for a StyleGAN, which incorporates the classifier model, in order to learn a classifier-specific StyleSpace. Explanatory attributes are then selected from this space. These can be used to visualize the effect of changing multiple attributes per image, thus providing image-specific explanations. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be modified in different ways to change its classifier output. Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable as measured in user-studies.

Using StylEx for image-specific explanation. StylEx can be used to explain classifier decision for a specific image. Here we show top automatically detected attributes for explaining the precieved age classification on a specific image. Each knob corresponds to one of the top discovered attributes in the image. Moving the knobs causes StylEx to change only this attribute in this image. The user of StylEx can use these manipulations to infer the semantic meanings of each attribute (demonstrated on the right). The probabilities on the top left corner are for the person to be percieved as old.

Attribute #1:

Attribute #2:
("Cotton Wool")

Attribute #1:
("Base leaf color")

Attribute #2:
("Rotten Apex")

(a) Top-2 automaticaly detected attributes for DME (retina disease) Classifier (b) Top-2 automaticaly detected attributes for Sick/Healthy Leaf Classifier

StylEx is applicable to a large variety of classifiers and real-world complex domains, including animals, leaves, faces and retinal images.



"Explaining in Style: Training a GAN to explain a classifier in StyleSpace",
Oran Lang*, Yossi Gandelsman*, Michal Yarom*, Yoav Wald*, Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani and Inbar Mosseri.


Supplementary Material:



Last updated: Apr 2021