papernotes Few-Shot Unsupervised Image-to-Image Translation

Few-Shot Unsupervised Image-to-Image Translation

Open howardyclo opened this issue 6 years ago • 1 comments

Jun 13 '19 14:06 howardyclo

Current unsupervised/unpaired image-to-image translation (UIT) methods (see ref) typically requires many images in both source and target classes, which greatly limits their use.
This paper proposes novel framework that only needs a few examples (few-shot) and can work on unseen target classes.
The proposed framework can also be applied to few-shot image classification and outperform a SoTA method based on feature hallucination.

Motivation: Human can imagine the unseen target classes (e.g., seeing a standing tiger for the first time and imagine it lying down) by past visual experiences (e.g., seeing another animal standing and lying down before).
- Past visual experience: Learn on images of many different classes.
- Imagine unseen classes: Translate images from source class to target class with few examples of target class.
Data: Source class images: Many source classes with each contain many images (e.g., species of animals).
Training: Use source class images to train a multi-class UIT model (the target class is still from source classes).
Inference: Few seen/unseen target class images only accessible during inference.

|S|: Number of source classes.
For D, each task determines whether an input image is real or fake of the source class. As there are |S| source classes, we have |S| binary outputs for D.
Input an real image x of a source class c_x, penalize D if its c_x-th output is fake. However, no penalization for outputting fake for other (|S|-1) source classes.
Input an fake image x¯ of a source class c_x, penalize D if its c_x-th output is real. Otherwise, penalize G.

Pixel values
- Learning from simulated and unsupervised images through adversarial training. CVPR 2017.
Pixel gradients
- Unsupervised pixel-level domain adaptation with generative adversarial networks. CVPR 2017.
Semantic features
- Unsupervised cross-domain image generation. ICLR 2017.
Class labels
- Unsupervised pixel-level domain adaptation with generative adversarial networks. CVPR 2017.
Pairwise sample distances
- One-sided unsupervised domain mapping. NIPS 2017.
Cycle consistency
- Dualgan: Unsupervised dual learning for image-to-image translation. ICCV 2017.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV 2017.
- Learning to discover cross-domain relations with generative adversarial networks. ICML 2017.
- Augmented cyclegan: Learning many-to-many mappings from unpaired data. ICML 2018.
- Toward multimodal image-to-image translation. NIPS 2017.
Shared latent space assumption
- Coupled generative adversarial networks. NIPS 2016.
- Unsupervised image-to-image translation networks. NIPS 2017.
Partially-shared latent space assumption
- Multimodal unsupervised image-to-image translation (MUNIT). ECCV 2018.
- Diverse image-to-image translation via disentangled representation. ECCV 2018.
- This work.

One-shot unsupervised cross domain translation. NIPS 2018: Assume one source class image but many target class images.

Jun 13 '19 14:06 howardyclo