spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-52461] [SQL] Collapse inner Cast from DecimalType to DecimalType

Open mihailoale-db opened this issue 6 months ago • 1 comments

What changes were proposed in this pull request?

When coercing an expression from DecimalType to DecimalType, remove the inner Cast if it is not user-specified one and if the outer DecimalType is wider than inner (precision and scale are greater) - or in other words, if it is redundant. Do that only in case SQLConf.COLLAPSE_INNER_CAST_TO_DECIMAL value is true.

Why are the changes needed?

To make single-pass and fixed-point implementations compatible.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests (regenerated golden files) + added ones.

Was this patch authored or co-authored using generative AI tooling?

No.

mihailoale-db avatar Jun 12 '25 12:06 mihailoale-db

Actually we should revisit this approach, it's correct for most of the cases but there is this interesting one. these queries:

create table t(col1 decimal(20,2));
select round(sum(col1)) as col1 from t having col1>=11;

return following plan

col1: decimal(29,0)
Filter (cast(col1#12977 as decimal(31,2)) >= cast(cast(cast(11 as decimal(2,0)) as decimal(20,2)) as decimal(31,2)))
+- Aggregate [round(sum(col1#12996), 0) AS col1#12977]
   +- SubqueryAlias t
      +- Relation t[col1#12996] parquet

and for these ones:

create table t(col1 decimal(20,2));
select round(sum(col1)) = 1;

this is the analyzed plan:

(round(sum(col1), 0) = 1): boolean
Aggregate [(round(sum(col1#12084), 0) = cast(cast(1 as decimal(1,0)) as decimal(29,0))) AS (round(sum(col1), 0) = 1)#12086]
+- SubqueryAlias t
   +- Relation t[col1#12084] parquet

In single-pass, for HAVING case, Cast is also decimal(29,0) as this is the actual type of round(sum(col1)) but as you can see in fixed-point its decimal(31,2) (which we get because of tempresolvedcolumn logic - it takes the precission of the expression used for resolution - sum(col1) which datatype is decimal(31,2) and thus both sides are casted to that wider type).

mihailoale-db avatar Jun 12 '25 16:06 mihailoale-db

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Sep 22 '25 00:09 github-actions[bot]