spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-52103][SQL] Fallback complex expression whole-stage codegen

Open wankunde opened this issue 7 months ago • 1 comments

What changes were proposed in this pull request?

Add a config to determines whether fallback the complex expressions codegen.

Why are the changes needed?

If the expression contains more than this of non-leaf expressions, the generated method may too long to be JIT compiled.

For the test query, the filter operator contains 193 non-leaf expressions and will generate about 3000 lines code, the code can not be JIT compiled and will be very slow.

SELECT  vv
FROM
(
    SELECT  vv, case vv
                when '1' then '1'
                when '2' then '2'
                when '3' then '3'
                when '4' then '4'
                when '5' then '5'
                when '6' then '6'
                when '7' then '7'
                when '8' then '8'
                when '9' then '9'
                when '10' then '10'
                when '11' then '11'
                when '12' then '12'
                when '13' then '13'
                when '14' then '14'
                when '15' then '15'
                when '16' then '16'
                when '17' then '17'
                when '18' then '18'
                when '19' then '19'
                when '20' then '20'
                when '21' then '21'
                when '22' then '22'
                when '23' then '23'
                when '24' then '24'
                when '25' then '25'
                when '26' then '26'
                when '27' then '27'
                when '28' then '28'
                when '29' then '29'
                when '30' then '30'
                when '31' then '31'
                when '32' then '32'
                else ''
                end as cv
    FROM (
        SELECT  regexp_replace(trim(lower(
                   get_json_object(concat(v,'}'),'$$.s'))),'\\n','') AS vv
        FROM values('a') as t(v)
    ) tmp
) t2
WHERE length(cv) > 0
AND cv not LIKE '%xxx%'

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

Was this patch authored or co-authored using generative AI tooling?

No

wankunde avatar May 13 '25 13:05 wankunde

Hi, @panbingkun do you have any idea about this codegen JIT fail issue?

wankunde avatar May 14 '25 10:05 wankunde