mdl-stance-robustness Testing on a Small New Dataset with zero (or almost no) Training Data

Hi, I have a very small stance detection dataset (80-100 examples) with 3 stance classes: disagree(class 0), agree(class 1), and balance (class 2) and I want to test the MT-DNN model's performance on this dataset. I have 2 questions regarding this:

Since my dataset is very small, I do not want to split this to create training samples. Is there a way to just test the MT-DNN model (trained on 10 datasets) on my dataset? I don't think that will be possible, because, without any training data, the MT-DNN model will not have a dataset-specific top layer. Is this correct?
Assuming that point 1 is valid, this is more of a theoretical question on MT-DNN. Let's say, I do build a minimal training set from my dataset (with just 2 examples from each class). For simplicity, let's assume MT-DNN is only tuned on 2 datasets, the first dataset has 'agree' as class 0, 'disagree' as class 1. The second dataset has 'agree' as class 0, 'disagree' as class 1, 'balanced' as class 2. Since both these datasets have different class labels compared to my dataset, will this cause any problem? For example, for a given sentence pair from my dataset, if the model decided that the stance label should be pro, will it predict the class label to be 0 or 1?

Feb 10 '22 03:02 rudra0713

Hi @rudra0713,

Yes, you are right, there. The dataset specific dense layers for the classification are not included.
No, this won't be a problem. You will have to create a LabelMapper and there you basically decide whether (e.g.) your "agree" label will be on position 0, 1, or 2 (based on the order you add them). The code will automatically create a classification layer based on the number of classes you pass there.

I hope that helps. Let me know if you have any more questions.

Feb 10 '22 15:02 v1nc3nt27

Hi @v1nc3nt27

Thanks for confirming.
Regarding Point 2 of your answer, I understand that code will create a classification layer based on the number of classes in my dataset (in my example 3). I am trying to understand, how would the model utilize the knowledge that it learned from other datasets. In my example, during training, the model probably learned that for an instance of the class "agree", the probability for class 0 should be high because 0 was the label for class "agree" in the other two datasets. But in my dataset, a sample (let's say x) in the class "agree" has the label of 1. So in this case, will the model still try to maximize the probability for class 0? Another way of saying this, let's say the model correctly realizes that sample x is an instance of class "agree", then will it check the Labelmapper to realize that in my dataset, "agree" is class 1, so it will increase the probability of class 1? (Also, please take into account that, my training set very very small.)

Feb 11 '22 02:02 rudra0713

Hi @v1nc3nt27, waiting for your response.

Feb 14 '22 21:02 rudra0713

Hi @rudra0713, the model won't check the labels, it will only leverage the index you set. It should be fine setting different indices for the same label if you use several different datasets. I have done this at training as well and, theoretically, the model should have learned to generalize the embeddings in the transformer and base the class decision solely (or at least mostly) on the classification layer.

Feb 15 '22 16:02 v1nc3nt27

mdl-stance-robustness mdl-stance-robustness copied to clipboard

Testing on a Small New Dataset with zero (or almost no) Training Data

mdl-stance-robustness
mdl-stance-robustness copied to clipboard