understanding-ai
understanding-ai copied to clipboard
Representation Learning with Contrastive Predictive Coding
https://arxiv.org/abs/1807.03748 ~big fan of Aaron van den Oord~
Abstract
- propose universal unsupervised learning approach to extract useful representations from high-dimensional data, CPC(Contrastive Predictive Coding)
- use probabilistic contrastive loss which induce latent space to capture information that is useful to predict future samples and tractable by using negative sampling
1. Introduction
- unsupervised learning is an important stepping stone towards robust and generic representation learning
- idea of predictive coding theories comes from neuroscience, Word2Vec, colorization, etc.
- paper proposes
- compress high-dimensional data into compact latent embedding space which conditional predictions are easier to model
- predict further steps with autoregressive model
- use NCE loss
- train CPC as end-to-end resulting model
- CPC model outperforms other models in various of domains
2. Contrastive Predicting Coding
2.1. Motivation and Intuitions
- in time series and high dimension modeling, they use next step prediction exploit the local smoothness of signal
- further in future, shared information becomes much lower and model needs to infer more global structure and these are called 'slow features'
- 'class size' < 'latent variable' in information
2.2. Contrastive Predictive Coding
-
g_enc maps the input sequence of observations x_t to latent representations
z_t = g_enc(x_t)
- autoregressive model g_ar summarizes all
z_<=t
in lthe latent space and produces a context latent representationc_t = g_ar(z_<=t)
it means
- proposed model, either z_t and c_t could be used as representation for downstream tasks
- c_t can be used if extra content from the past is useful
- z_t might not contain enough information to capture phonetic content
2.3. Noise Contrastive Estimation Loss
2.4. Related work
- triplet losses using max-margin to separate positive from negative examples
- time contrastive learning which minimize between embeddings from multiple viewpoints of the same scene and maximize different embeddings extracted from different timesteps
- in word2vec neighbouring words are predicted using a contrastive loss
3. Experiments
3.1. Audio
3.2. Vision
model
experiment result
3.3. Reinforcement Learning
4. Conclusions
- CPC combines autoregressive modeling and noise-contrastive estimation with intuitions from predictive coding to learn abstract representations in an unsupervised fashion
Notes
- ~~best paper of this week~~
- I will implement this paper to code as soon as possible