trino icon indicating copy to clipboard operation
trino copied to clipboard

JSON predicate pushdown for Pinot

Open robertzych opened this issue 1 year ago • 20 comments

Description

These changes allow the Trino-Pinot connector to perform predicate pushdowns for a combination of JSON functions.

For example, the following Trino function calls:

where
    contains(ARRAY['O'], json_extract_scalar(json, '$.orderstatus')) 
    AND (
      json_array_contains(json_extract(json, '$.stringArray'), 'O') 
      OR json_array_contains(json_extract(json, '$.stringArray'), 'F') 
    ) 
    AND (
        json_extract_scalar(json, '$.dne') IS NULL 
        OR json_array_contains(json_extract(json, '$.stringArray'), 'X') = false
        OR json_array_contains(json_extract(json, '$.intArray'), 0)
    );

Are translated to the following Pinot function calls:

where (
    JSON_MATCH(json, '"$.orderstatus" in (''O'')') 
    AND (
        JSON_MATCH(json, '"$.stringArray[*]" = ''O''') 
        OR JSON_MATCH(json, '"$.stringArray[*]" = ''F''')
    ) 
    AND (
        JSON_MATCH(json, '"$.dne" IS NULL') 
        OR NOT(JSON_MATCH(json, '"$.stringArray[*]" = ''X'''))
        OR JSON_MATCH(json, '"$.intArray[*]" = 0')
    )
)

The new session property (pinot.json_predicate_pushdown_enabled) defaults to false because JSON_MATCH can fail when a JSON index hasn't been configured in Pinot.

Additional context and related issues

PinotMetadata.applyFilter calls buildConstraintPQL which recursively walks through the call tree and returns the converted SQL if the entire call tree is supported.

Release notes

TODO

( ) This is not user-visible or is docs only, and no release notes are required. (X) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:

robertzych avatar Oct 30 '24 03:10 robertzych