Andy Grove
Andy Grove
I tried this out and did run into an issue when connecting from DataGrip: ``` INTERNAL: Failed to create SessionContext: Context("Could not create table for file:///mnt/bigdata/tpch/sf10-parquet/partsupp.parquet at /home/andy/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/7b5842b/datafusion/core/src/catalog/listing_schema.rs:116", ObjectStore(Generic {...
> Thank you so much for your review! I realize you cloned and did manual testing and that probably took a fair bit of time, which I really appreciate :)...
Related blog post: https://romankudryashov.com/blog/2021/11/monitoring-rust-web-application/
@waynexia This looks good, but could you add a test to demonstrate what the output looks like?
I modified the test to show the input that causes this error: ``` -Row(a='oNÍ[\x87\x01áe>\x85', regexp_replace(a, .*$, PROD, 1)='PRODPROD\x85PROD') +Row(a='oNÍ[\x87\x01áe>\x85', regexp_replace(a, .*$, PROD, 1)='PROD\x85PROD') ```
Simpler repro: ```python .with_special_case('a\x85') ``` ``` -Row(a='a\x85', regexp_replace(a, .*$, PROD, 1)='PRODPROD\x85PROD') +Row(a='a\x85', regexp_replace(a, .*$, PROD, 1)='PROD\x85PROD') ``` Note that `a\x84` works fine, so this appears to be specific to certain...
Adding the triage label back now that there is a summary of the issue.
Note that we transpile the pattern `.*$` to `[^\n\r\u0085\u2028\u2029]*(?:\r|\u0085|\u2028|\u2029|\r\n)?$` before passing to cuDF, so maybe we need to update this to handle `NEL`. I am investigating.
cuDF repro: ```java @Test void testStringReplaceEdgeCase() { TableDebug debug = TableDebug.builder().build(); RegexProgram target = new RegexProgram( "[^\n\r\u0085\uc285\u2028\u2029]*(?:\r|\u0085|\uc285|\u2028|\u2029|\r\n)?$"); try (ColumnVector input = ColumnVector.fromStrings("a\n", "a\u0085"); ColumnVector expected = ColumnVector.fromStrings("PRODPROD\nPROD", "PRODPROD\u0085PROD"); Scalar replace...
@NVnavkumar based on the test I posted here, I am not sure if this is really a bug in cuDF or not. What do you think?