Jovan Veljanoski comments

Results 94 comments of


Jovan Veljanoski

enable proper `in` checking

This materializes the column right? That's.. not ideal...

enable proper `in` checking

Yes because in that example, you created an in memory dataset. So your data (and that column) is already in memory. But if you read an hdf5/arrow/parquet file, you first...

Drop Duplicates (Simple POC)

What is the expected memory usage? Let me test it again this evening with the big dataset and will report back.

Drop Duplicates (Simple POC)

This is great and a much needed feature. However since it uses `groupby`, it is not trully "out-of-core" because the result is in memory (the output of groupby). And it...

Drop Duplicates (Simple POC)

I dont think this was every added to vaex-contrib. So if you want to try it out, you can either check out this branch (quite out of date by now...

Setup Circle CI for Windows builds

...and we run out of credits :)

Concatenating large files

I am not aware of any such technique or tool. I understand your concerns about data duplication/redundancy, but keep in mind that this sort of conversion that is implemented now...

Concatenating large files

Ah i see your point. Indeed appending in coming data to an already existing blob is something we've been thinking about. I has come up in a few discussions in...

Concatenating large files

So looking at the code, `vaex.open_many` expects a list of file paths. It is `vaex.open()` that accepts a globe expression within as a first argument. I would expect this `df1...

Concatenating large files

If you can make a reproducible example that would be great!