dfply icon indicating copy to clipboard operation
dfply copied to clipboard

How to select multiple values from same column using musk

Open prudhviraju535 opened this issue 6 years ago • 2 comments

Guys, How to filter multiple values from same column, Below code throws the error.

import pandas as pd from dfply import * data = pd.DataFrame({"Col1" :["a","b","c","d"],"Col2":[1,2,3,4]}) data >> mask(X.Col1 == ["a","b"])

Error: ValueError: Arrays were different lengths: 4 vs 2

prudhviraju535 avatar May 17 '18 18:05 prudhviraju535

You can't compare a column / series to a list that way. This will fail in base Pandas:

In [12]
data.Col1 == ["a","b"]

Truncated Traceback (Use C-c C-x to view full TB):
pandas\_libs\ops.pyx in pandas._libs.ops.vec_compare()

ValueError: Arrays were different lengths: 4 vs 2

You'd need to do something like this:

In [13]
data >> mask(X.Col1.isin(["a","b"]))
Out [13]:
  Col1  Col2
0    a     1
1    b     2

cunningjames avatar May 22 '18 12:05 cunningjames

Thanks for posting this answer.

I also ran into the same problem, and solved it in a rather roundabout way by first generating true/false arrays for each term, then using logical or on said arrays.

Your answer is much more readable and probably just as performant (if that matters).

sharpe5 avatar May 23 '18 06:05 sharpe5