daru icon indicating copy to clipboard operation
daru copied to clipboard

Regular expression matching for vectors

Open lokeshh opened this issue 9 years ago • 2 comments

I feel like we should have a function to match regular expression like we have eq, lt, etc.

It would work as follow:

> dv = Daru::Vector.new ['abc', 'aaa', 'xet', 'ccc']
> dv.where(dv.match /a/)
#<Daru::Vector:17216960 @name = nil @metadata = {} @size = 2 >
    nil
  0 abc
  1 aaa

lokeshh avatar Jun 26 '16 17:06 lokeshh

What about just a generic function-to-boolean array process. Something like

dv.where(dv.recode_bool { |v| v.match /a/ })

dv.recode_bool would take anything "truthy" and convert it to true, otherwise false.

That being said. I would also prefer not having to rewrite dv numerous times while filtering. My dataframe names are usually pretty long and it's a bit annoying to have to reuse these long names all of the time. What I'd like to see is instead of

very_long_data_vector_thing.where(very_long_data_vector_thing.eq(5))

it would be nice to have

very_long_data_vector_thing.where.eq(5)

gnilrets avatar Jun 27 '16 00:06 gnilrets

recode_bool looks good to me. It's very generic and can be used in a variety of use cases. The #match function can be a wrapper over that just calls the regexp matcher on each element of the Vector internally using #recode_bool.

@gnilrets your implementation of where might be feasible for Vectors but for DataFrames it would be tough since you need to select the vector which you want to use for the comparison.

For vectors, maybe we can invoke a new VectorBoolMatcher (or named something like that) object in case the user has not specified any arguments, which means that #where is being used for chaining with eq and others.

v0dro avatar Jun 27 '16 15:06 v0dro