substrait icon indicating copy to clipboard operation
substrait copied to clipboard

expand scale bound on decimal or explain reason for limitation in documentation

Open benbellick opened this issue 2 months ago • 3 comments

There is a hard limitation of 38 on the precision of the builtin Decimal type, which seems arbitrary. The PR in which this was contributed doesn't clarify why that limitation was set.

Furthermore, I'm not sure that this max is enforced in the client libraries. In our company (which uses both substrait-java and substrait-go), we are able to send a plan containing a precision of 40.

Upon researching around, it looks like 38 may have been chosen because of precedent set by:

However, in other systems, there is no such limitation:

One complexity here is that the extension functions which feature precision depend quite a bit on this 38 number. E.g.

urn: extension:io.substrait:functions_arithmetic_decimal
scalar_functions:
  -
    name: "add"
    description: "Add two decimal values."
    impls:
      - args:
          - name: x
            value: decimal<P1,S1>
          - name: y
            value: decimal<P2,S2>
        options:
          overflow:
            values: [ SILENT, SATURATE, ERROR ]
        return: |-
          init_scale = max(S1,S2)
          init_prec = init_scale + max(P1 - S1, P2 - S2) + 1
          min_scale = min(init_scale, 6)
          delta = init_prec - 38
          prec = min(init_prec, 38)
          scale_after_borrow = max(init_scale - delta, min_scale)
          scale = init_prec > 38 ? scale_after_borrow : init_scale
          DECIMAL<prec, scale>

As a possible fix, I propose one of the following options:

1. Introduce max precision parameter

Instead of decimal being represented by DECIMAL<prec, scale>, we would represent it by DECIMAL<max_prec,prec,scale>. To maintain backwards compatibility, we could say that max_prec defaults to 38. Then the above file would look like:

urn: extension:io.substrait:functions_arithmetic_decimal
scalar_functions:
  -
    name: "add"
    description: "Add two decimal values."
    impls:
      - args:
          - name: x
            value: decimal<M, P1,S1>
          - name: y
            value: decimal<M, P2,S2>
        options:
          overflow:
            values: [ SILENT, SATURATE, ERROR ]
        return: |-
          init_scale = max(S1,S2)
          init_prec = init_scale + max(P1 - S1, P2 - S2) + 1
          min_scale = min(init_scale, 6)
          delta = init_prec - M
          prec = min(init_prec, M)
          scale_after_borrow = max(init_scale - delta, min_scale)
          scale = init_prec > M ? scale_after_borrow : init_scale
          DECIMAL<M, prec, scale>

2. Clarify documentation and possibly add new type

We could add some clarification to the docs on why this limitation exists. Then if there is need in the future, we could create a new type to handle more arbitrary precision.


I have a strong preference for 1 above unless there is a reason to maintain the limitation. Thanks!

benbellick avatar Nov 07 '25 20:11 benbellick

38 digit limit is coming from 128 bit integer and Substrait chose the literal to be 128 bit integer. We could extend the maximal precision as optional type argument (not leading but the last) maybe...

yongchul avatar Nov 07 '25 22:11 yongchul

We could also consider just introducing a new type if that is easier 🤷. Though if the limitation will stay, then I think it makes sense to just add something to the documentation to explain why the limitation exists. What do you think?

benbellick avatar Nov 10 '25 20:11 benbellick

I suggest a new type as the type derivation equations would also have to be defined (I'm not sure they would only change the the max precision, they may also change the magic number "6". And I'd guess that they may be different for different systems. (For example, I think Oracle supports unlimited precision.)

jacques-n avatar Nov 12 '25 04:11 jacques-n