caml-mimic icon indicating copy to clipboard operation
caml-mimic copied to clipboard

Issue in train_ful, test_full, dev_full files

Open sajidaraz opened this issue 4 years ago • 4 comments
trafficstars

I prepared the data following dataproc_mimic_III.ipynb file and i got six file i.e train_50, test_50, dev_50, train_full, test_full, dev_full. I am facing problem with train_full, test_full and dev_full such that train_full contain 8686 unique labels, test_full contain 4075 unique labels and dev_full contains 3009 unique labels. I don't know why labels are not of equal size in each file and now how to make them of equal size so that I can train my model.

kindly help me

sajidaraz avatar Mar 13 '21 05:03 sajidaraz

This is because there are some of the codes only occur once. So none of the three splits contains all unique codes.

airingzhang avatar Mar 18 '21 04:03 airingzhang

can you kindly guide me on how to make these labels of equal size? so that we can train a model because the model does not accept the different sizes of labels in y_train and y_test, y_valid.

sajidaraz avatar Mar 20 '21 05:03 sajidaraz

I am not the author. BUT, I guess this is actually the setting of this task (full label scenario) that training set does not see all the unique labels.

airingzhang avatar Mar 20 '21 16:03 airingzhang

@sajidaraz @sarahwie Have you found the solution?

monk1337 avatar Oct 10 '21 16:10 monk1337