vaex icon indicating copy to clipboard operation
vaex copied to clipboard

[BUG-REPORT] Filtering breaks negative indexing

Open karotchykau opened this issue 1 year ago • 1 comments

Description The following code

import pandas as pd
import vaex

p_df = pd.DataFrame({"A": ["abc"] * 100})
df = vaex.from_pandas(p_df)
f_df = df[df["A"] == "abc"]

f_df[99]  # Works fine.
f_df[-1]  # Throws an error (same for any negative number).

throws

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [3], in <cell line: 6>()
      3 f_df = df[df["A"] == "abc"]
      5 f_df[99]  # Works fine.
----> 6 f_df[-1]

File ~/mambaforge/envs/tmp_env/lib/python3.9/site-packages/vaex/dataframe.py:5337, in DataFrame.__getitem__(self, item)
   5335 if isinstance(item, int):
   5336     names = self.get_column_names()
-> 5337     return [self.evaluate(name, item, item+1, array_type='python')[0] for name in names]
   5338 elif isinstance(item, six.string_types):
   5339     if hasattr(self, item) and isinstance(getattr(self, item), Expression):

File ~/mambaforge/envs/tmp_env/lib/python3.9/site-packages/vaex/dataframe.py:5337, in <listcomp>(.0)
   5335 if isinstance(item, int):
   5336     names = self.get_column_names()
-> 5337     return [self.evaluate(name, item, item+1, array_type='python')[0] for name in names]
   5338 elif isinstance(item, six.string_types):
   5339     if hasattr(self, item) and isinstance(getattr(self, item), Expression):

File ~/mambaforge/envs/tmp_env/lib/python3.9/site-packages/vaex/dataframe.py:3090, in DataFrame.evaluate(self, expression, i1, i2, out, selection, filtered, array_type, parallel, chunk_size, progress)
   3088     return self.evaluate_iterator(expression, s1=i1, s2=i2, out=out, selection=selection, filtered=filtered, array_type=array_type, parallel=parallel, chunk_size=chunk_size, progress=progress)
   3089 else:
-> 3090     return self._evaluate_implementation(expression, i1=i1, i2=i2, out=out, selection=selection, filtered=filtered, array_type=array_type, parallel=parallel, chunk_size=chunk_size, progress=progress)

File ~/mambaforge/envs/tmp_env/lib/python3.9/site-packages/vaex/dataframe.py:6362, in DataFrameLocal._evaluate_implementation(self, expression, i1, i2, out, selection, filtered, array_type, parallel, chunk_size, raw, progress)
   6360     mask = self._selection_masks[FILTER_SELECTION_NAME]
   6361     i1, i2 = mask.indices(i1, i2-1)
-> 6362     assert i1 != -1
   6363     i2 += 1
   6364 # TODO: performance: can we collapse the two trims in one?

AssertionError: 

Software information

  • Vaex version (import vaex; vaex.__version__): {'vaex-core': '4.9.2', 'vaex-viz': '0.5.2', 'vaex-hdf5': '0.12.2', 'vaex-server': '0.8.1', 'vaex-astro': '0.9.1', 'vaex-jupyter': '0.8.0', 'vaex-ml': '0.17.0'}
  • Vaex was installed via: mamba install -c conda-forge vaex
  • OS: macOS Monterey, Version 12.4

Additional information No additional information to add.

karotchykau avatar Aug 05 '22 08:08 karotchykau

Thanks for the report! I hope we can fix it soon.

JovanVeljanoski avatar Aug 08 '22 22:08 JovanVeljanoski

for the record, somewhat related to https://github.com/vaexio/vaex/issues/2123

maartenbreddels avatar Aug 30 '22 07:08 maartenbreddels

Ok, this now works with master, so probably fixed in #2123

maartenbreddels avatar Aug 31 '22 09:08 maartenbreddels

Should be included in 4.11.1

maartenbreddels avatar Aug 31 '22 09:08 maartenbreddels

Ok, I was too quick, this is really fixed in https://github.com/vaexio/vaex/pull/2163 and will be released in the next version!

maartenbreddels avatar Aug 31 '22 09:08 maartenbreddels