arrow2
arrow2 copied to clipboard
Add `contains(List, Array)`
For the contains
operations, the list needs to be duplicated for each element in the array, according to the test pointed by the link below. What is the reason behind this implementation?
https://github.com/jorgecarleitao/arrow2/blob/91bf35caa305c983206946253e34b60b7df60d20/src/compute/contains.rs#L177
contains
is currently implemented as contains(Array, Array) -> BooleanArray
and vectorized over each item of each (a la zip
). The tests is written just to test the different behaviors.
Are you wondering why there isn't something like list_array.contains(item) -> bool
?
Thanks for your timely reply. My question is why the interface is not list_array.contains(array) -> BooleanArray. In the current implementation, the first array duplicates the list many times, which consumes more memory than necessary. I guess the reason for such an implementation is vectorization.
just to check I understood: given let r = list_array.contains(array)
, r.len() == array.len()
, I.e. is the idea to check whether list_array
contains item 0, 1, 2, ..., N of array
, right?
Yes, that is what I mean.
cool. If you think it is useful, let's convert this question into an issue not framed as a question and implement it as a function, e.g. something like
fn can_in_list(&DataType) -> bool;
fn in_list<O: Offset>(list: ListArray<O>, values: &dyn Array) -> Result<BooleanArray>;
Sure. I am happy to do that.
Hey, I'm working on this