Deep Domain-Adversarial Image Generation for Domain Generalization (referred to as DDAIG henceforth) trains a conditional generator that produces perturbed images from source domains, so as to fool the domain classifier, but learn the label classifier. The model is similar to the classic GAN structure except that it’s conditioned on the source image, and it has two discriminators (that performs multi-class classification, not binary). DDAIG prevails over almost all other models that use meta-learning methods or domain-alignment methods. Experiments performed on image and digit classification, as well as person re-id task.
Intuition
If we break down the problem of domain generalization to its foundation, what we actually want is a strong classifier for ALL distributions / domains. In that case, the classifier doesn’t even need the notion of domains (i.e. should not learn any information relevant to domains specifically). That’s exactly what the model this paper proposed, DDAIG, is doing — train a model to forget about domains, but keep the information about the labels. With this intuition in mind, adversarial training seems like a straightforward tool, but any other design that makes the model “forget the domains, learn the labels” goes down the same path.
Method
As shown at the very beginning of this post, the structure of DDAIG contains one Domain Transformation Net (called the DoTNet in the paper), which adds perturbation onto the source domain images, and two classifiers for label and domain classification, respectively. All the three mentioned structures are Neural Nets.
The training flow is presented as the following. All loss functions J in this paper are cross-entropy losses.
DotNet (Domain Transformation Network)
DoTNet generates perturbations the input image instead of synthesizing images from scratch (as in GANs).The structure of DoTNet in this paper is based on Fully-convolutional Network (FCN) but any feature extractor can theoretically make sense. The perturbed image is generated as the equation below where λ is a positive hyperparameter typically between 0.1 and 0.7.
Note that the perturbation here is learnt by the neural net instead of being calculated as in some adversarial attack methods. The loss function of DotNet will be the addition of label-classifier’s loss and negative domain-classifier’s loss (remember we want to increase the loss of the domain-classifier).
Label Classifier and Domain Classifier
The “classifier” name here is somewhat misleading. Both label and domain classifiers are in fact comprised of a feature extractor and a classifier. Again, the choice of structure does not matter in theory. As a guideline, in the official repo the team used ImageNet-pretrained ResNet-18 for PACS experiments. The domain classifier network performs multi-class classification (instead of just predicting real or fake) such that the model does not simply force images from one source domain to another source domain. The loss function for both classification networks are shown bellow.
Now let’s look at the training procedure overall. As in classic GAN training, within each iteration, DDAIG first updates the DoTNet and then the two classifiers. One trick here is that for the first few iterations, it’s better to warm up the label classifier so that the competition later is meaningful.
Result and Discussion
This paper evaluates their model on Digit-DG benchmark (MNIST, MNIST-M, SVHN, and SYN datasets), PACS benchmark, and Office-Home dataset for the homogeneous DG setting. Person re-ID task (DukeMTMC-reID and Market1501 dataset) is used for testing on the heterogeneous setting.
All three homogeneous DG settings followed the “leave one domain out” fashion for domain split. Below we’ll discuss the result of PACS and Office-Home dataset.
Let’s first look at the PACS dataset result, which is the most popular benchmark for DG experiments. The margin of improvement was a solid 2.4% between DDAIG and the state-of-the-art by then, and a 3.6% improvement from naive multi-domain training. To me, this result indicates that the simple intuition of DDAIG actually is a promising direction for further research in DG. On the other hand, Office-Home is a dataset with smaller domain shift, but much more classes (65 classes for each of the 4 domains). Thus we see in Office-Home experiment, the vanilla model provides a much more competitive result compared to that in PACS. However, it’s worth noticing that in a smaller domain-shift setting, the advantage of DDAIG over vanilla model is no longer that obvious.
Thought: Data Augmentation that helps model generalize.
Before DDAIG, most papers focus meta-learning and domain-alignment methods. There were a number of data augmentation methods though, such as to augment images by adversarial attack, but none has exhibited promising results are DDAIG did. Later after this paper, meta-learning methods tend to appear less, whereas many augmentation-based methods surged with increasing performance. So here’s the question: Is the next step for DG research either on data or representation ? Considering the outburst of self-supervised representation learning papers last year (2020), I’m excited to see how SSL may facilitate the improvement on DG.
Happy coding!