eicu-code icon indicating copy to clipboard operation
eicu-code copied to clipboard

Discrepancies with Medication table notebook

Open juanweber opened this issue 4 years ago • 0 comments

I'm running the example notebooks to test how the database works in my local machine, and the Medication table gives me results way different than what are shown here. For example the code

drug = 'VANCOMYCIN'
query = query_schema + """
select 
  distinct patientunitstayid
from medication
where drugname like '%{}%'
""".format(drug)

df_drug = pd.read_sql_query(query, con)
print('{} unit stays with {}.'.format(df_drug.shape[0], drug))

gives me 20991 unit stays instead of 29737. The code

drug = 'VANCOMYCIN'
query = query_schema + """
select 
  distinct patientunitstayid
from medication
where drugname ilike '%{}%'
""".format(drug)
df_drug = pd.read_sql_query(query, con)
print('{} unit stays with {}.'.format(df_drug.shape[0], drug))

gives me 30318 unit stays instead of 41867. But then the next chunk of code gives me the same 9716 unit stays with HICL = 10093.

Then df_drug['drughiclseqno'].value_counts() gives me only

4042.0     12
8466.0      9
10093.0     5
37442.0     3

instead of

4042.0     333
10093.0    206
37442.0    119
8466.0      53
920.0       15
24859.0      8
3976.0       7
1694.0       3
4285.0       3
4283.0       3
8259.0       3
18084.0      2
6312.0       2
1403.0       1

And, the code

hicl = 1403
query = query_schema + """
select 
  drugname, count(*) as n
from medication
where drughiclseqno = {}
group by drugname
order by n desc
""".format(hicl)

df_hicl = pd.read_sql_query(query, con)
df_hicl.head()

gives me

drugname	n
0	None	200

instead of

	drugname	n
0	compounded cream	9
1	Rx compound and Mix	8
2	butt butter compound	7
3	TESTOSTERONE 1 % (25 MG/2.5 G) TD GLPK	7
4	zinc/aquaphor/nystatin	5

Finally, the code

for hicl in [4042, 10093, 37442]:
    query = query_schema + """
    select 
      drugname, count(*) as n
    from medication
    where drughiclseqno = {}
    group by drugname
    order by n desc
    """.format(hicl)

    df_hicl = pd.read_sql_query(query, con)
    print('HICL {}'.format(hicl))
    print('Number of rows: {}'.format(df_hicl['n'].sum()))
    print('Top 5 rows by frequency:')
    print(df_hicl.head())
    print()

gives me a None item for drugname in the first row with an n bigger than what appeared in the published notebook. n=22753 for HICL 4042, n=7144 for 10093 and n=1977 for 37442, but the number of rows obtained for each HICL is the same.

Are these results the correct ones for V2.0 or is it possible for there to be an error in the data while building the database?

Thank you very much!

juanweber avatar Dec 04 '19 23:12 juanweber