pandas-plink
pandas-plink copied to clipboard
pandas_plink is slow since Dask 2024.2
Dask 2024.2 makes pandas-plink unusably slow (~4 hrs to read a 850 MB bed file). Due to improved tokenization, Dask now computes a sha1 hash of buff
for each call to _delayed. Since buff
has the same size as the file, this takes about a second each time, and this is done approximately 104 times. The easiest workaround seems to be setting the parameter pure
to False.