arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++] String manipulation on a dictionary column

Open thisisnic opened this issue 1 year ago • 0 comments

Describe the bug, including details regarding any error messages, version, and platform.

I was trying to write up an R example using a dataset that has lots of dictionary columns, but am unable to do string manipulation on it. Here's a slightly contrived examples using mtcars:

library(arrow)
library(dplyr)
mtcars |>
  mutate(cyl = as.factor(as.character(cyl))) |>
  arrow_table() |>
  mutate(cyl6 = str_detect(cyl, "6")) |>
  collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! NotImplemented: Function 'match_substring_regex' has no kernel matching input types (dictionary<values=string, indices=int8, ordered=0>)

I'm wondering if this is something we could enable from the R package side, perhaps looking at the type of the column and doing some casting if we are trying to do this type of operation? Or even in the C++ using the dictionary values? I'm not sure how complicated this would be though.

Component(s)

C++

thisisnic avatar Mar 08 '24 17:03 thisisnic