pystore icon indicating copy to clipboard operation
pystore copied to clipboard

Q: Fastest way to load multiple DataFrames same time

Open cgi1 opened this issue 4 years ago • 0 comments

Awesome project.

Just a short question, I have like 2000 stored dataframes now and I would like to load 500 of it as fast as possible into one python process. Is there a batch-load function in it?

I coded something with ThreadPoolExecutor and it loads 3GB on disk into around a 40GB DataFrame (which is pretty heavy) in under four minutes using 5 threads.

Does somebody see a faster variant? The SSD is relaxed, it looks like the performance limiation lies in df = item.to_pandas(), which is CPU intensive.

cgi1 avatar May 04 '20 17:05 cgi1