pyprophet icon indicating copy to clipboard operation
pyprophet copied to clipboard

Use duckdb loading throughout pyprophet

Open jcharkow opened this issue 1 year ago • 2 comments

With export-parquet, we have an additional dependency of duckdb which allows for fast SQL queries, especially those involving lots of joins.

Here roll out duckdb SQL queries in pyprophet for greater data loading efficiency.

Examples

Conducted on dell XPS ubuntu

Export Command

time pyprophet export --in=39041_Hela_500ng_15SPD_DIA_Py3_1_S2-A7_1_4502.osw

Old timings: real 0m56.284s user 0m35.997s sys 0m15.130s

New timings: real 0m12.832s user 0m40.578s sys 0m8.378s

Score Command

  • Only 1 iteration so most of the time showcased is loading the data time pyprophet score --in=39041_Hela_500ng_15SPD_DIA_Py3_1_S2-A7_1_4502.osw --ss_num_iter=1

Old Timings: real 0m59.466s user 1m30.275s sys 0m11.004s

New timings: real 0m30.482s user 1m21.186s sys 0m9.460s

jcharkow avatar Dec 04 '24 14:12 jcharkow

I am not sure why the tests are not being conducted.

jcharkow avatar Dec 04 '24 14:12 jcharkow

Ok I think the tests are passing just not appearing in this PR for some reason

jcharkow avatar Dec 04 '24 17:12 jcharkow

Will close this, as the recent PR #142 covers this.

singjc avatar Jun 04 '25 14:06 singjc