PapersAnalysis
PapersAnalysis copied to clipboard
Paper Read - Universal Domain Adaptation through Self Supervision
Overview
Universal Domain Adaptation through Self Supervision
https://arxiv.org/abs/2002.07953
NOTE
For the best rendering please install and activate Tex all the things - Chrome Plugin which provides browser side math rendering
If it is active you should see the following inline math $a=b$ and math equation
$$ a x^{2} + b x + c = 0 \quad x \in \mathbb{R} $$
correctly
Universal Domain Adaptation through Self Supervision
Universal Domain Adaptation through Self Supervision
Analysis
Overview
-
NN as $Y = f(X)$ with
- $X$ : Source Domain
- $Y$ : Target Domain
-
In the Training Set we have
- $P(X_{train})$ : Source Domain Training Distribution
- $P(Y_{train})$ : Target Domain Training Distribution
-
In the Test Set andin Production we have
- $P(X_{test})$ : Source Domain Test Distribution
- $P(T_{test})$ : Target Domain Test Distribution
The underlying assumption for the NN to work well in practice is $P(X_{training})$ is very similar to $P(X_{test})$ so that both the training and test instances are taken from the same distribution Otherwise we are dealing with a domain adaptation problem as the Training Domain has changed with respect to the Test Domain Let's use the $\tilde P()$ to identify a changed distribution
Types of Domain Adaptation
Axis 1 : One step vs Multi steps
-
In a DNN there are multiple layers hence multiple domains, let's say $X_{i} = f_{i}(X_{i-1}) \quad i \in \mathbb{N}^{+}$ so
- $f_{i}()$ : the i-th processing layer
- $P(X_{i-1})$ : the distribution in input to the i-th layer, with $P(X_{0})$ the input distribution
- $P(X_{i})$ : the distribution in output of the i-th layer
-
The PDF divergence certainly affects the input domain $\tilde P(X_{train})$ but then it propagates through the DNN causing a domain shift also in deeper layers $\tilde P(X_{i}) \quad i>0$
-
The level of permeation of the divergence depends on the learned features transferability : the more they are transferable, the less the permeation
-
This is also a key factor to achieve generalization
- in fact in practice it is possible to see the test set source distribution as divergence with respect to the input distribution $P(X_{0}) \rightarrow \tilde P(X_{0})$
- however a DNN able to generalize is robust against this shift in the sense it is able to absorb it instead of letting it permeate through it up to the final layers causing a visible effect on the output
Axis 2 : Divergence Type
There can be 2 types of divergences
- data distribution only which is called homoegeneous domain adaptation
- so $\tilde P(X)$ but the dimensionality is preserved so $D(X)$ is the same
- involving dimensionality which is called heterogeneous domain adaptation
- there is a change in the dimensionality $\tilde D(X)$ which also reflects into the data distribution $\tilde P(X)$