nodejs-polars icon indicating copy to clipboard operation
nodejs-polars copied to clipboard

Expressions as parameters in StringNamespace functions

Open fbx31 opened this issue 10 months ago • 4 comments

Hi and sorry again, I think I found a global issue in StringNamespace implementation. Comparing with this example of slice implementation between string an list

let df = pl.DataFrame(
    {
        A: ["apple"],
        fruits: [["apple", "banana", "cherry", "dragon fruit"]],
    }
)
df = df.select(
    "A",
    "fruits",
    // "cars",
    pl.col("fruits").lst.slice(0, 2).alias("lst_slice_js"),
    pl.col("fruits").lst.slice(pl.lit(0), pl.lit(2)).alias("lst_slice_expr"),
    pl.col("A").str.slice(0,2).alias("str_slice_js"),
    pl.col("A").str.slice(pl.list(0),pl.lit(2)).alias("str_slice_expr"),
)

This raises the following error for the last expression .str.slice(pl.list(0),pl.lit(2)) :

<...>\node_modules\nodejs-polars\bin\lazy\expr\string.js:10
        return (0, expr_1._Expr)(_expr[method](...args));
                                              ^
Error: Failed to recover `JsExpr` type from napi value
    at wrap (<...>\node_modules\nodejs-polars\bin\lazy\expr\string.js:10:47)
    at Object.slice (<...>\node_modules\nodejs-polars\bin\lazy\expr\string.js:88:20)

Then I compared the code on rust side and js side for both slice implementations

    #[napi(catch_unwind)]
    pub fn list_slice(&self, offset: &JsExpr, length: Option<&JsExpr>) -> JsExpr {
        let length = match length {
            Some(i) => i.inner.clone(),
            None => dsl::lit(i64::MAX),
        };
        self.inner
            .clone()
            .list()
            .slice(offset.inner.clone(), length)
            .into()
...
	#[napi(catch_unwind)]
    pub fn str_slice(&self, offset: &JsExpr, length: &JsExpr) -> JsExpr {
        self.inner
            .clone()
            .str()
            .slice(offset.inner.clone(), length.inner.clone())
            .into()
    }

Except the difference for optional length (which is not an issue, nothing to highlight) Then in JS:

list:
slice(offset, length) {
      return wrap(
        "listSlice",
        exprToLitOrExpr(offset)._expr,
        exprToLitOrExpr(length)._expr,
      );
    },

string:
	slice(start, length?) {
      if (!Expr.isExpr(start)) {
        start = lit(start)._expr;
      }
      if (!Expr.isExpr(length)) {
        length = lit(length)._expr;
      }

      return wrap("strSlice", start, length);
    },

The processing of the parameter seems to differ between both implementations.

Sorry in advance if I am wrong...

fbx31 avatar Feb 24 '25 10:02 fbx31

OK fixed this locally on my computer for str.slice directly inside bin/.lazy/expr/string.js

        slice(start, length) {
            // if (!expr_1.Expr.isExpr(start)) {
            //     start = (0, functions_1.lit)(start)._expr;
            // }
            // if (!expr_1.Expr.isExpr(length)) {
            //     length = (0, functions_1.lit)(length)._expr;
            // }
            // return wrap("strSlice", start, length);
            return wrap("strSlice", (0, expr_1.exprToLitOrExpr)(start)._expr, (0, expr_1.exprToLitOrExpr)(length)._expr);
        },

And looking at the code, I think there are many places where we have the same type of "error/confusion".

For some of them, the rust code should be also reviewed. Exemple with str_replace function where pat and val should be &JsExpr instead of String and then calling pat.inner.clone() and val.inner.clone() and then calling ca.replace(pat.onner.clone(), val.inner.clone().

    #[napi(catch_unwind)]
    pub fn str_replace(&self, pat: String, val: String) -> JsExpr {
        let function = move |s: Column| {
            let ca = s.str()?;
            match ca.replace(&pat, &val) {
                Ok(ca) => Ok(Some(ca.into_column())),
                Err(e) => Err(PolarsError::ComputeError(format!("{:?}", e).into())),
            }
        };
        self.clone()
            .inner
            .map(function, GetOutput::same_type())
            .with_fmt("str.replace")
            .into()
    }

Only a "guess", shall be confirmed by the maintainers and eventually changed inside all functions accepting parameters (TBC), as I am not yet able to locally test modifications of rust code for now.

Thanks

fbx31 avatar Feb 24 '25 11:02 fbx31

Error: Failed to recover JsExpr type from napi value happens when we try to pass a string instead of a expression. Some functions expect a string, others can handle string, regex or expression. I am sure we have cases when thing fail apart due to so many options. We just need to improve our test coverage. Feel free to submit a PR to add more test cases.

Bidek56 avatar Feb 24 '25 19:02 Bidek56

Thanks a lot, OK, I will do my best to list and detail all missing test cases I encounter. Currently trying to rebuild everything on my computer, but a bit hard as you are referring git versions in your toml and not "standard" packages, my personal network configuration is a bit (let's say) restrictive with that but I will manage. Also... the nightly toolchain to be installed apparently... Any news on 0.46 polars support that brings a bunch of new functions startsWith, endsWith... ?

fbx31 avatar Feb 25 '25 13:02 fbx31

  1. Git reference is needed b/c one of the json packages was not been converted to a crate by Ritche
  2. Nightly toolchain is used by the core Polars team, hence needed
  3. 0.46 PR is kind of blocked ATM b/c the core team has remove utf8 serialization, not sure how to fix it ATM
  4. Be sure to lint everything before submitting a PR and test with both yarn and bun

Bidek56 avatar Feb 25 '25 13:02 Bidek56

@fbx31 Polars rs-0.46 has been merged and released as v0.19.

Bidek56 avatar Jul 20 '25 01:07 Bidek56