mimic-code icon indicating copy to clipboard operation
mimic-code copied to clipboard

instructions to load dataset to Amazon Athena

Open yuvallb opened this issue 2 years ago • 3 comments

yuvallb avatar Mar 03 '22 20:03 yuvallb

Thanks Nintorac! Let me know if there are other issues, and also if all went well.

yuvallb avatar Oct 31 '22 12:10 yuvallb

There ended up being issues on most of the tables, I pretty much just set every varchar as a string so there might be some more improvements to make but here is the current schema I've come up with

schema.tar.gz (this outdated now, will upload again later)

Also seeing some weird results for some tables. eg

SELECT * FROM "mimic_iv_raw"."d_labitems" limit 10;

image

row 3 has a column that looks like "something, else" and athena is breaking on the , within the quotes

Nintorac avatar Nov 02 '22 01:11 Nintorac

Would be happy to merge (a working version of) this - my initial thoughts are:

  1. Not familiar with loading data into Athena, but my hunch is you need to specify that the fields are quoted "
    • Could be that you need to specify quotes in the info here: https://github.com/MIT-LCP/mimic-code/blob/3c910009d54a20a2439ae2b89e3556116368a2dc/mimic-iv/buildmimic/athena/schema.sql#L18-L19
    • Or it could be that this is misspecified, since the text columns are quoted: https://github.com/MIT-LCP/mimic-code/blob/3c910009d54a20a2439ae2b89e3556116368a2dc/mimic-iv/buildmimic/athena/schema.sql#L26-L33
  2. Needs an update for v2.2. The tables in core have been moved to hosp, and there are new columns.
  3. Not sure what the distinction is between the mimiciv_parquet and mimiciv_csv schemas. In general it would be great to keep the mimiciv_hosp, mimiciv_icu schema names as that makes concept code much easier to transfer.

alistairewj avatar Jan 15 '23 20:01 alistairewj