visidata icon indicating copy to clipboard operation
visidata copied to clipboard

[loader] datatable

Open geekscrapy opened this issue 2 years ago • 2 comments

Datatable is much like pandas but with a focus on big data and speed. Is this a package that might help vd with big datasets?

https://datatable.readthedocs.io/en/latest/index.html

I'd love to see this this implemented as a "bare" loader. By this I mean user functionality is limited to only what is capable by the datatable library. So, in this case, it can do grouping, it can do searching, etc. it can even do regex searching, but it can't do things like splitcol, regex capture etc. I say this as these are the types of features (I think!) that tend to slow down calculations as the data gets chopped up. I'd love to see how a "native" version of a loader might work, if only it's core functionality is used, and how fast it could be (albeit losing functionality, but then, if you need the extra features, you could probably just do a deepcopy?).

I also see this approach helping with diff saving potentially.

Anyhow, thoughts! I've looked into how feasible it would be, but it's blue sky (big data) thinking 🙃

Speed tests against pandas: https://towardsdatascience.com/an-overview-of-pythons-datatable-package-5d3a97394ee9

geekscrapy avatar Nov 10 '21 19:11 geekscrapy

Interesting idea, @geekscrapy! I'll be looking into Ibis to achieve this as well.

saulpw avatar Nov 10 '21 20:11 saulpw

See also: duckdb

daviewales avatar Jan 18 '22 11:01 daviewales