vaex
vaex copied to clipboard
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
As vaex supports the stream reading of hdf5 file from S3 bucket, any plans to implement an api to upload/stream upload the vaex as hdf5/arrow file to S3 bucket?
I think it's too early, but worth a try!
This is a simple implementation of a drop duplicates functionality using existing sources. It only supports saving the first duplicated element. In the future (or now), we may want to...
This is an implementation of a new **Pipeline** which wraps a few standard solutions needs and the vaex state. General idea: Any transformation you do on the dataframe as long...
Require rebase after #882 See unittest for usage. Note that I don't think Arrow supports these operations/kernels yet, so the implementations now are more like placeholders, see this as a...
Addresses #816 Enable groupby on column that is of fixed length string type. - [x] Implement test - [ ] Make it pass
1. Add a dtypes param to *get_column_names* 2. Add this type to the *getitem* method for a quick shortcut. example: ``` >>> from vaex.ml.datasets import load_titanic >>> df = load_titanic()...
just strange code noticed
this should speed up CI
Addressses #621