disdrodb
disdrodb copied to clipboard
Using Arrow to further speed up raw data I/O
Prework
- [x] Read and agree to the code of conduct.
- [x] If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
- [ ] Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- [ ] Runnable
- [ ] Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
Description
Evaluate the benefits of using:
- the
engine="arrow"inread.csvto read the raw data using multithreading, - the
arrowdtype backend introduced in pandas 2.0 to decrease the memory usage of string columns inpd.DataFrame
Please describe the performance issue.
Benchmarks
How poorly does DISDRODB perform?