daru
daru copied to clipboard
Regular expression matching for vectors
I feel like we should have a function to match regular expression like we have eq, lt, etc.
It would work as follow:
> dv = Daru::Vector.new ['abc', 'aaa', 'xet', 'ccc']
> dv.where(dv.match /a/)
#<Daru::Vector:17216960 @name = nil @metadata = {} @size = 2 >
nil
0 abc
1 aaa
What about just a generic function-to-boolean array process. Something like
dv.where(dv.recode_bool { |v| v.match /a/ })
dv.recode_bool would take anything "truthy" and convert it to true, otherwise false.
That being said. I would also prefer not having to rewrite dv numerous times while filtering. My dataframe names are usually pretty long and it's a bit annoying to have to reuse these long names all of the time. What I'd like to see is instead of
very_long_data_vector_thing.where(very_long_data_vector_thing.eq(5))
it would be nice to have
very_long_data_vector_thing.where.eq(5)
recode_bool looks good to me. It's very generic and can be used in a variety of use cases. The #match function can be a wrapper over that just calls the regexp matcher on each element of the Vector internally using #recode_bool.
@gnilrets your implementation of where might be feasible for Vectors but for DataFrames it would be tough since you need to select the vector which you want to use for the comparison.
For vectors, maybe we can invoke a new VectorBoolMatcher (or named something like that) object in case the user has not specified any arguments, which means that #where is being used for chaining with eq and others.