clinical-fusion icon indicating copy to clipboard operation
clinical-fusion copied to clipboard

Errors during training phase, RuntimeError: you must first build vocabulary before training the model

Open CalendulaED opened this issue 3 years ago • 1 comments

I have successfully run the first five python file for preprocessing including: $ python 00_define_cohort.py # define patient cohort and collect labels $ python 01_get_signals.py # extract temporal signals (vital signs and laboratory tests) $ python 02_extract_notes.py --firstday # extract first day clinical notes $ python 03_merge_ids.py # merge admission IDs $ python 04_statistics.py # run statistics $ python 05_preprocess.py # run preprocessing

However, when I tried to run the $ python 06_doc2vec.py --phase train # train doc2vec model It shows: image

The only line that I have modified is changed line 32 from: train_ids = list(map(lambda x: int(x[-10:-4]), train_ids)) to train_ids = list(map(lambda x: int(float(x[-10:-4])), train_ids))

Since if I don't add this float, it will result in the ValueError: invalid literal for int() with base 10:

When I first encounter the problem of RuntimeError: you must first build vocabulary before training the model I tried to change the min_count from 5 to 1 in line 39. However, it doesn't work.

Can you help me with this problem? Thank you so much for your help!❤️

CalendulaED avatar Jun 09 '21 05:06 CalendulaED

I am still having the same problem. The only warning that I got before running python 06_doc2vec.py --phase train is when python 02_extract_notes.py --firstday It will shows the warning like: Screen Shot 2021-06-30 at 10 07 26 AM

But I don't think this is the reason that causes this: Screen Shot 2021-06-30 at 10 08 05 AM

I am really confused about this, can you help me with this problem? I think I correctly follow the steps on the readme document.

Thank you!

CalendulaED avatar Jun 30 '21 02:06 CalendulaED