caml-mimic icon indicating copy to clipboard operation
caml-mimic copied to clipboard

Error in concat_and_split.py function split_data

Open credo99 opened this issue 6 years ago • 8 comments

Everything fine in the notebook for mimic3 until: tr, dv, te = concat_and_split.split_data(fname, base_name=base_name)

notes_labeled.csv disch_full.csv

are OK, generated successfully but hadm_id = row[1] looks like there is an empty row somewhere in the header, no?

SPLITTING 0 read

IndexError Traceback (most recent call last) in ----> 1 tr, dv, te = concat_and_split.split_data(fname, base_name=base_name)

~\Documents\GitHub\caml-mimic\dataproc\concat_and_split.py in split_data(labeledfile, base_name) 75 print(str(i) + " read") 76 ---> 77 hadm_id = row[1] 78 79 if hadm_id in hadm_ids['train']:

IndexError: list index out of range

credo99 avatar Feb 06 '19 14:02 credo99

I'm facing the exact same issue. Did you figure out a solution for this ?

aparnapai7 avatar Jul 10 '19 18:07 aparnapai7

I'm facing the exact same issue. Did you figure out a solution for this ?

I solved this issue just skip the empty row. only run when if len(row) > 1

Benzenoil avatar Jul 10 '19 23:07 Benzenoil

Same issue. Did anyone modified the function split_data. @Benzenoil Did you changed this in a notebook or from concat_and_split.

NeelKanwal avatar Sep 04 '19 19:09 NeelKanwal

@NeelKanwal Yes, I just changed this from concat_and_split.py Or you may try on Ubuntu since I did not face the same issue when I run the code under Ubuntu.

Benzenoil avatar Sep 05 '19 16:09 Benzenoil

Thanks,

I tried it but error does not change. I tried to run it on Jupyter on local machine as well as Google Colab. Every other thing like constants, datamimic file placement is correct but again it is strange. I can see to try it on ubuntu but it seems to be system independent as described in readme.

NeelKanwal avatar Sep 06 '19 08:09 NeelKanwal

If you check the generated notes_labeled.csv, you will find that there is an empty row between every two records. It is the empty rows that cause the row[1] to have an IndexError.

But how the empty rows were generated? due to line 38 in concat_and_split.py? I guess? w.writerow([subj_id, str(hadm_id), text, ';'.join(cur_labels)])

acadTags avatar Nov 19 '19 12:11 acadTags