nlp_course
nlp_course copied to clipboard
Help with seminar on conversation
Hello, I am trying to adapt the idea of learning on triplets for the classification task on an imbalanced datasets.
I am selecting 1 anchor, 1 positive example from the same class and 1 negative example from random other classes. I want to make the model learn how to embed sentences of same classes closer together and afterwards train SVM or something else to make classification according to embedding received from the trained model.
Can you please suggest what the model's architecture could look like? In course you suggested using several dense layers on top of the pretrained Bert (you also suggested not training Bert embeddings, but just training these dense layers). What should be good output size of the vector if I want to use it later for classification? Maybe 16?
I will be very grateful for suggestions!
P.S. Ребята, вы действительно лучшие, мне Ваш курс очень помог в изучении NLP!
Hi!
Disclaimer: i'm not very well versed with metric learning tasks. An expert's opinion should be preferable to mine.
(0) Not all BERT-like models are created equal :) There is a particular subtype that is aimed at embedding whole sentences -- no guarantees, but it might be worth trying. Here's a lib that has a bunch of them: https://github.com/UKPLab/sentence-transformers (1) dim=16 seems too small based on my (limited) experience. The actual dimension should depend on the number of classes, but last time I was working on a similar architecture for retrieval, the optimal dimensions were in 128-1024 range (2) If the dataset is large enough (so, tens of thousands, rather than hundreds), it is usually beneficial to also fine-tune BERT layers with a small learning rate. In that case, you can worry less about the architecture
Hi!
Disclaimer: i'm not very well versed with metric learning tasks. An expert's opinion should be preferable to mine.
(0) Not all BERT-like models are created equal :) There is a particular subtype that is aimed at embedding whole sentences -- no guarantees, but it might be worth trying. Here's a lib that has a bunch of them: https://github.com/UKPLab/sentence-transformers (1) dim=16 seems too small based on my (limited) experience. The actual dimension should depend on the number of classes, but last time I was working on a similar architecture for retrieval, the optimal dimensions were in 128-1024 range (2) If the dataset is large enough (so, tens of thousands, rather than hundreds), it is usually beneficial to also fine-tune BERT layers with a small learning rate. In that case, you can worry less about the architecture
Thanks a lot for answering! I used sentence-transformers. Also had the idea that it would be better than using just Bert. For starters I used miniLM-v6 as it was quick to train.
I have 140k datapoints in datasets. Number of classes is 6, but 80% of the datapoints belong to 1 class — class "others" and to most under-represented class belong only 0.5% of datapoints. The thing is that after training the embeddings I need to somehow use them to later classify new sentences. And here is where I am stuck. I thought about using SVM but it will not work with such high dimensional datapoints. I am now thinking about using a fully connected network with a couple of dense layers in order to classify datapoints according to embeddings. But that would once again leave me with a problem of very imbalanced dataset.
please, resubmit, if still relevant