pandas-plink icon indicating copy to clipboard operation
pandas-plink copied to clipboard

pandas_plink is slow since Dask 2024.2

Open bgorissen opened this issue 4 months ago • 0 comments

Dask 2024.2 makes pandas-plink unusably slow (~4 hrs to read a 850 MB bed file). Due to improved tokenization, Dask now computes a sha1 hash of buff for each call to _delayed. Since buff has the same size as the file, this takes about a second each time, and this is done approximately 104 times. The easiest workaround seems to be setting the parameter pure to False.

bgorissen avatar Feb 22 '24 22:02 bgorissen