elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[ES|QL]support non-unary mv functions

Open fang-xing-esql opened this issue 1 year ago • 1 comments

This is for the initial review of supporting non-unary mv functions through evaluator code generation. More tests and docs need to be added and potential code refactor also. Just to send this out to see if we are heading to the right direction.

There are two types of evaluators used by scalar functions: Evaluator: generates code for scalar functions with multiple single-value inputs. MvEvaluator: generates code for mv functions with single input.

For mv functions like mv_index, mv_append etc., they are mv functions with multiple single-value or multi-value inputs, and today we don't have an evaluator that supports them. This PR modifies Evaluator to support multi-value inputs, because it has the infrastructure to support multiple input parameters, however it can be refactored into MvEvaluator if needed.

There are two mv functions included in this PR, they provide the same functionality through different ways: mv_index: does not use evaluator to generate code, the code to handle different input data types are manually coded, it is included in case we don't want to take the evaluator approach. mv_slice: use modified Evaluator to generate code to handle different input data types.

fang-xing-esql avatar Feb 15 '24 19:02 fang-xing-esql

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine avatar Feb 15 '24 19:02 elasticsearchmachine

I think it's pretty close! I left a bunch of picky comments and ways that might help with checkstyle. Also, it probably wants a unit test - it might be easy to inherit from AbstractMultivalueFunctionTestCase - or maybe that's too baked into single-valued-ness.

Thank you for reviewing Nik! I'm working on addressing these comments.

fang-xing-esql avatar Feb 21 '24 16:02 fang-xing-esql

Thanks @fang-xing-esql, looks pretty good so far. I agree with Nik's observations; I just left one additional comment on how much control we want to delegate to the function implementation, mostly a trade-off between simplicity and flexibility.

Thank you for reviewing Luigi! I think we can provide more flexibility to the caller of the evaluator, so that is can be used easily, I was thinking to allow the caller to process one record/row at a time, and it seems appending null between beginPositionEntry and endPositionEntry works, but I will test more, need to add more tests outside of CsvTest as Nik suggested. It may take a few more iterations of reviews I expect, hopefully it can make the work of future mv functions easier.

fang-xing-esql avatar Feb 21 '24 16:02 fang-xing-esql

Close this PR, as the replacement PR is merged.

fang-xing-esql avatar Mar 12 '24 12:03 fang-xing-esql