vaex
vaex copied to clipboard
enable proper `in` checking
current behavior
import vaex
df = vaex.from_arrays(
id=vaex.vrange(0, 200_000)
)
299_999 in df.id # True but wrong
proposed
299_999 in df.id # False
This materializes the column right? That's.. not ideal...
@JovanVeljanoski after speaking with @maartenbreddels it was my understanding is that .values
is 0 mem copy. https://vaexio.slack.com/archives/C017EEHSQ84/p1646850161602389
Yes because in that example, you created an in memory dataset. So your data (and that column) is already in memory. But if you read an hdf5/arrow/parquet file, you first will put it to memory (since otherwise you are reading from disk) then you gonna do the stuff that you like.
@JovanVeljanoski thanks for the explanation, I didn't understand that. So is there a better approach here? Currently returning True
always could lead to many side-effects for end users
@JovanVeljanoski how about now?
I like it, I just don't know why 7 in df["id"]
evaluates to True 🤷
Hmm @maartenbreddels when i tested locally i did not get that behavior. Are there docs to build vaex from scratch locally?
https://vaex.io/docs/installing.html#for-developers should help!