mimic-code
mimic-code copied to clipboard
SQLite import for mimic3 gives mixed column type warning
Prerequisites
- [X ] Put an X between the brackets on this line if you have done all of the following:
- Checked the online documentation: https://mimic.mit.edu/
- Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=
Description
While trying to import mimic3 into SQLite with import.py, I get the following error:
Starting processing DATETIMEEVENTS.csv.gz
mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.
for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...
Starting processing INPUTEVENTS_CV.csv.gz
/home/armando/projects/mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (20,21) have mixed types. Specify dtype option on import or set low_memory=False.
for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...
Starting processing NOTEEVENTS.csv.gz
/home/armando/projects/mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (4,5) have mixed types. Specify dtype option on import or set low_memory=False.
for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...
Starting processing CHARTEVENTS.csv.gz
/home/armando/projects/mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.
for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...
Hi, I also am running the import.py code and I ran into the same problem...
Did you manage to figure it out or find an alternative solution?
It's not strictly an error but it may result in an inconsistent data load (I haven't checked). Essentially the load uses pandas as a convenience. pandas tries a low memory load, fails, and reverts to a high memory load. It can be fixed by specifying the known data types for each table in the read_csv call.
Since the column types are already known in advance and are not going to change since its a frozen/snapshot dataset, hence would it be good to add the column type to the import script? I can send a pull request if this solution is acceptable.
Yes it would for sure, and yes we would love a PR!