PapersAnalysis
PapersAnalysis copied to clipboard
Paper Summary - Causality for Machine Learning
Overview
Causality for Machine Learning
Arxiv: https://arxiv.org/abs/1911.10500
- The Beuchet Chair
Key | Value |
---|---|
Type of Contribution | Theory |
Objective | Learning Causal Relationships instead of just Correlations |
Key Points
1. Intelligence
- Key problem: define intelligence
- No unique definition, even for humans, not to mention for AI
- Some key aspects here
- Intelligence as the ability to generalize
- from seen to unseen data
- from one task to another task
- Intelligence as theability to act in an imagined space (definition of thinking, according to Konrad Lorentz)
- Implicitly, in order to be able to act in an imagined space it is necessary to learn to predict in such a space which under the hood means learning casual relationships appunto
2. Data Driven Machine Learning
2.1 IID Assumption
-
Data Driven ML consists of learning models from data
-
How is data generated or to be more explicit what are the assumptions on data?
-
Typically ML methods rely on the assumptions that data samples are Independent and Identically Distributed (IID) which means
- they all come from the same PDF (Identically Distributed)
- the sampling process is without memory (Independent)
-
What happens when these assumptions do not hold?
-
Typically performance drops but sometimes this could be very sharp, for example let's consider Adversarial Attacks
-
Adversarial Attacks can be seen as the result of a violation of the "Identically Distributed" assumption: the PDF they come from is too distant form the training one (domain gap)
-
They can also be seen as a failure of the model to generalize properly (not enough intelligent)
-
They can also be seen as the result of model instability in certain points of their input space as a small variation in the input causes a huge variation in the output
-
But adding also the temporal dimension, Adversarial Attacks can be seen as a violation of the "Independent" assumption, as an attacker can resubmit more and more times the same sample
-
Furthermore, considering training, as it is an iterative process the weights at a certain iteration depend both on the data observed at that iteration and the weights at the previous iterations so failing at properly shuffling the samples in the dataset could make the NN learn some false correlation in its weights