vaex icon indicating copy to clipboard operation
vaex copied to clipboard

enable proper `in` checking

Open Ben-Epstein opened this issue 2 years ago • 8 comments

current behavior

import vaex
df = vaex.from_arrays(
    id=vaex.vrange(0, 200_000)
)

299_999 in df.id  # True but wrong

proposed

299_999 in df.id # False

Ben-Epstein avatar Mar 10 '22 14:03 Ben-Epstein

This materializes the column right? That's.. not ideal...

JovanVeljanoski avatar Mar 10 '22 14:03 JovanVeljanoski

@JovanVeljanoski after speaking with @maartenbreddels it was my understanding is that .values is 0 mem copy. https://vaexio.slack.com/archives/C017EEHSQ84/p1646850161602389

Ben-Epstein avatar Mar 10 '22 15:03 Ben-Epstein

Yes because in that example, you created an in memory dataset. So your data (and that column) is already in memory. But if you read an hdf5/arrow/parquet file, you first will put it to memory (since otherwise you are reading from disk) then you gonna do the stuff that you like.

JovanVeljanoski avatar Mar 10 '22 15:03 JovanVeljanoski

@JovanVeljanoski thanks for the explanation, I didn't understand that. So is there a better approach here? Currently returning True always could lead to many side-effects for end users

Ben-Epstein avatar Mar 10 '22 18:03 Ben-Epstein

@JovanVeljanoski how about now?

Ben-Epstein avatar Mar 21 '22 14:03 Ben-Epstein

I like it, I just don't know why 7 in df["id"] evaluates to True 🤷

maartenbreddels avatar Mar 25 '22 13:03 maartenbreddels

Hmm @maartenbreddels when i tested locally i did not get that behavior. Are there docs to build vaex from scratch locally?

Ben-Epstein avatar Mar 25 '22 14:03 Ben-Epstein

https://vaex.io/docs/installing.html#for-developers should help!

maartenbreddels avatar Apr 13 '22 11:04 maartenbreddels