dfply
dfply copied to clipboard
How to select multiple values from same column using musk
Guys, How to filter multiple values from same column, Below code throws the error.
import pandas as pd from dfply import * data = pd.DataFrame({"Col1" :["a","b","c","d"],"Col2":[1,2,3,4]}) data >> mask(X.Col1 == ["a","b"])
Error: ValueError: Arrays were different lengths: 4 vs 2
You can't compare a column / series to a list that way. This will fail in base Pandas:
In [12]
data.Col1 == ["a","b"]
Truncated Traceback (Use C-c C-x to view full TB):
pandas\_libs\ops.pyx in pandas._libs.ops.vec_compare()
ValueError: Arrays were different lengths: 4 vs 2
You'd need to do something like this:
In [13]
data >> mask(X.Col1.isin(["a","b"]))
Out [13]:
Col1 Col2
0 a 1
1 b 2
Thanks for posting this answer.
I also ran into the same problem, and solved it in a rather roundabout way by first generating true/false arrays for each term, then using logical or on said arrays.
Your answer is much more readable and probably just as performant (if that matters).