clinical-fusion
clinical-fusion copied to clipboard
Errors during training phase, RuntimeError: you must first build vocabulary before training the model
I have successfully run the first five python file for preprocessing including:
$ python 00_define_cohort.py # define patient cohort and collect labels $ python 01_get_signals.py # extract temporal signals (vital signs and laboratory tests) $ python 02_extract_notes.py --firstday # extract first day clinical notes $ python 03_merge_ids.py # merge admission IDs $ python 04_statistics.py # run statistics $ python 05_preprocess.py # run preprocessing
However, when I tried to run the $ python 06_doc2vec.py --phase train # train doc2vec model
It shows:
The only line that I have modified is changed line 32 from:
train_ids = list(map(lambda x: int(x[-10:-4]), train_ids))
to
train_ids = list(map(lambda x: int(float(x[-10:-4])), train_ids))
Since if I don't add this float, it will result in the ValueError: invalid literal for int() with base 10:
When I first encounter the problem of RuntimeError: you must first build vocabulary before training the model
I tried to change the min_count
from 5 to 1 in line 39. However, it doesn't work.
Can you help me with this problem? Thank you so much for your help!❤️
I am still having the same problem.
The only warning that I got before running python 06_doc2vec.py --phase train
is when python 02_extract_notes.py --firstday
It will shows the warning like:
But I don't think this is the reason that causes this:
I am really confused about this, can you help me with this problem? I think I correctly follow the steps on the readme document.
Thank you!