mimic-code
mimic-code copied to clipboard
instructions to load dataset to Amazon Athena
Thanks Nintorac! Let me know if there are other issues, and also if all went well.
There ended up being issues on most of the tables, I pretty much just set every varchar as a string so there might be some more improvements to make but here is the current schema I've come up with
schema.tar.gz (this outdated now, will upload again later)
Also seeing some weird results for some tables. eg
SELECT * FROM "mimic_iv_raw"."d_labitems" limit 10;
row 3 has a column that looks like "something, else"
and athena is breaking on the ,
within the quotes
Would be happy to merge (a working version of) this - my initial thoughts are:
- Not familiar with loading data into Athena, but my hunch is you need to specify that the fields are quoted
"
- Could be that you need to specify quotes in the info here: https://github.com/MIT-LCP/mimic-code/blob/3c910009d54a20a2439ae2b89e3556116368a2dc/mimic-iv/buildmimic/athena/schema.sql#L18-L19
- Or it could be that this is misspecified, since the text columns are quoted: https://github.com/MIT-LCP/mimic-code/blob/3c910009d54a20a2439ae2b89e3556116368a2dc/mimic-iv/buildmimic/athena/schema.sql#L26-L33
- Needs an update for v2.2. The tables in
core
have been moved tohosp
, and there are new columns. - Not sure what the distinction is between the
mimiciv_parquet
andmimiciv_csv
schemas. In general it would be great to keep themimiciv_hosp
,mimiciv_icu
schema names as that makes concept code much easier to transfer.