mimic3-benchmarks icon indicating copy to clipboard operation
mimic3-benchmarks copied to clipboard

Sql

Open Saqibm128 opened this issue 6 years ago • 11 comments

adds SQL support to project as per #16 and explicitly declare code sections in README.md and dependencies in requirements.txt

Saqibm128 avatar Oct 30 '17 00:10 Saqibm128

Hey @Saqibm128 what's the status of this? Are you waiting on us to merge?

turambar avatar Jan 11 '18 20:01 turambar

Yes, I changed some of the methods to read data tables to take an optional parameter called use_db, and added a new argument to the arg parser to use-db

Saqibm128 avatar Jan 12 '18 03:01 Saqibm128

Looking at this PR this week. Going to try to include in the release.

turambar avatar Jan 16 '18 23:01 turambar

@Saqibm128 How much faster is this? @turambar What's the state of things in terms of getting this merged in? More than happy to lend a hand here.

KirkHadley avatar Aug 26 '18 06:08 KirkHadley

Faster? Do you mean execution speed? If the SQL server is on the same computer, then it goes about as fast. If it is on a separate computer and the network host of the SQL isn't the localhost, then I guess it would go much slower due to network lag.

Saqibm128 avatar Aug 26 '18 13:08 Saqibm128

I haven't actually looked at this project in a while, but if execution speed for setting up the directory structure is becoming a huge bottleneck, I might want to look into using parallel threads/processes.

Saqibm128 avatar Aug 26 '18 14:08 Saqibm128

Thanks for getting back so quick. And yeah I didn't really look at the code before asking and naively assumed you'd taken the joins logic and rewritten that in SQL. My guess would be that, ceteris paribus, a series of group by + join queries would be faster than the current method.

But speaking of parallelism, I actually have cobbled together a rather hackish parallel extract_subjects and extract_episodes, would cleaning it up and writing a PR be worthwhile or have folks mostly moved on from actively maintaining things here?

KirkHadley avatar Aug 26 '18 15:08 KirkHadley

To be honest, I am not really involved in this project, added the SQL stuff well after they had started, and haven't been paying attention to this for awhile since, so I'm not sure at all.

Using joins actually makes a lot of sense, I don't think I was thinking at all about it, surprisingly. If you have a hacky parallel script, go ahead and share! I don't know if it would be accepted but others may still find any code of any quality useful.

Saqibm128 avatar Aug 26 '18 16:08 Saqibm128

Oh also just remembered they have a gitter chat room. https://gitter.im/YerevaNN/mimic3-benchmarks

Saqibm128 avatar Aug 26 '18 16:08 Saqibm128

@Harhro94 Now that the 1.0 release is done, should one of us should review this and then make the merge?

turambar avatar Aug 26 '18 19:08 turambar

@turambar is there still value in this? should I try to fix merge conflicts, or should I close the pr?

Saqibm128 avatar Sep 29 '18 19:09 Saqibm128