Discard "UNK_UNK" tag for original bert implementation

Open OsmanMutlu opened this issue 5 years ago • 0 comments

I've been running experiments using this repo with qc-fine data and my own data, which is for a binary classification task. In my experiments with my data, your bert implementation was not getting the same results as the original google's bert implementation.

So I just removed the "UNK_UNK" label from the label list for bert (only ganbert.py uses this label), and I managed to get the same results. This can be due to the fact that the training data for bert does not have any sample that has the "UNK_UNK" label, so it becomes a useless tag. This does not seem to be an issue if you have many labels, as in qc-fine dataset, but for tasks that have a small number of labels, it is a serious one.

I also changed the qc-fine data into binary labelled data, keeping "hum_ind" label and changing the rest to "REST_REST". I got similar results to the aforementioned ones.

I can also share my results if you like.

Best,

Nov 08 '20 14:11 OsmanMutlu