arrow2 icon indicating copy to clipboard operation
arrow2 copied to clipboard

Add `contains(List, Array)`

Open jyizheng opened this issue 2 years ago • 7 comments

For the contains operations, the list needs to be duplicated for each element in the array, according to the test pointed by the link below. What is the reason behind this implementation?

https://github.com/jorgecarleitao/arrow2/blob/91bf35caa305c983206946253e34b60b7df60d20/src/compute/contains.rs#L177

jyizheng avatar Aug 09 '21 16:08 jyizheng

contains is currently implemented as contains(Array, Array) -> BooleanArray and vectorized over each item of each (a la zip). The tests is written just to test the different behaviors.

Are you wondering why there isn't something like list_array.contains(item) -> bool?

jorgecarleitao avatar Aug 09 '21 16:08 jorgecarleitao

Thanks for your timely reply. My question is why the interface is not list_array.contains(array) -> BooleanArray. In the current implementation, the first array duplicates the list many times, which consumes more memory than necessary. I guess the reason for such an implementation is vectorization.

jyizheng avatar Aug 09 '21 17:08 jyizheng

just to check I understood: given let r = list_array.contains(array), r.len() == array.len(), I.e. is the idea to check whether list_array contains item 0, 1, 2, ..., N of array, right?

jorgecarleitao avatar Aug 09 '21 17:08 jorgecarleitao

Yes, that is what I mean.

jyizheng avatar Aug 09 '21 17:08 jyizheng

cool. If you think it is useful, let's convert this question into an issue not framed as a question and implement it as a function, e.g. something like

fn can_in_list(&DataType) -> bool;
fn in_list<O: Offset>(list: ListArray<O>, values: &dyn Array) -> Result<BooleanArray>;

jorgecarleitao avatar Aug 09 '21 17:08 jorgecarleitao

Sure. I am happy to do that.

jyizheng avatar Aug 09 '21 18:08 jyizheng

Hey, I'm working on this

ozgrakkurt avatar Apr 02 '23 08:04 ozgrakkurt