vaex
vaex copied to clipboard
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
When virtual columns are slow, you can persist them to file. I don't want to merge this yet, since I don't think it works nicely with the state/get/set system. Needs...
using VAEX_PROGRESS=rich, exporting would show too many tasks
to support the following symmetricly ``` import vaex df = vaex.example() df.export("file.json") vaex.open("file.json") ```
Split off from #1987
current behavior ``` import vaex df = vaex.from_arrays( id=vaex.vrange(0, 200_000) ) 299_999 in df.id # True but wrong ``` proposed ``` 299_999 in df.id # False ```
Checklist: - [x] Add unit-test - [ ] Make tests pass
Helper function to concatenate many hdf5 files. Tested against hundreds of thousands of files. I could imagine using this when a user globs with a `.open` where vaex can call...
Fixes https://github.com/vaexio/vaex/issues/1883 - [x] Make a unit test exposing the issue - [ ] Make test pass
This makes is irrelevalt for a user if a column is dict encoded or not all dataframe operations on it will be the same. For backwards compatibility we still do...