expand scale bound on decimal or explain reason for limitation in documentation
There is a hard limitation of 38 on the precision of the builtin Decimal type, which seems arbitrary. The PR in which this was contributed doesn't clarify why that limitation was set.
Furthermore, I'm not sure that this max is enforced in the client libraries. In our company (which uses both substrait-java and substrait-go), we are able to send a plan containing a precision of 40.
Upon researching around, it looks like 38 may have been chosen because of precedent set by:
However, in other systems, there is no such limitation:
- PostgreSQL: 1000
- MySQL: 65
One complexity here is that the extension functions which feature precision depend quite a bit on this 38 number. E.g.
urn: extension:io.substrait:functions_arithmetic_decimal
scalar_functions:
-
name: "add"
description: "Add two decimal values."
impls:
- args:
- name: x
value: decimal<P1,S1>
- name: y
value: decimal<P2,S2>
options:
overflow:
values: [ SILENT, SATURATE, ERROR ]
return: |-
init_scale = max(S1,S2)
init_prec = init_scale + max(P1 - S1, P2 - S2) + 1
min_scale = min(init_scale, 6)
delta = init_prec - 38
prec = min(init_prec, 38)
scale_after_borrow = max(init_scale - delta, min_scale)
scale = init_prec > 38 ? scale_after_borrow : init_scale
DECIMAL<prec, scale>
As a possible fix, I propose one of the following options:
1. Introduce max precision parameter
Instead of decimal being represented by DECIMAL<prec, scale>, we would represent it by DECIMAL<max_prec,prec,scale>. To maintain backwards compatibility, we could say that max_prec defaults to 38. Then the above file would look like:
urn: extension:io.substrait:functions_arithmetic_decimal
scalar_functions:
-
name: "add"
description: "Add two decimal values."
impls:
- args:
- name: x
value: decimal<M, P1,S1>
- name: y
value: decimal<M, P2,S2>
options:
overflow:
values: [ SILENT, SATURATE, ERROR ]
return: |-
init_scale = max(S1,S2)
init_prec = init_scale + max(P1 - S1, P2 - S2) + 1
min_scale = min(init_scale, 6)
delta = init_prec - M
prec = min(init_prec, M)
scale_after_borrow = max(init_scale - delta, min_scale)
scale = init_prec > M ? scale_after_borrow : init_scale
DECIMAL<M, prec, scale>
2. Clarify documentation and possibly add new type
We could add some clarification to the docs on why this limitation exists. Then if there is need in the future, we could create a new type to handle more arbitrary precision.
I have a strong preference for 1 above unless there is a reason to maintain the limitation. Thanks!
38 digit limit is coming from 128 bit integer and Substrait chose the literal to be 128 bit integer. We could extend the maximal precision as optional type argument (not leading but the last) maybe...
We could also consider just introducing a new type if that is easier 🤷. Though if the limitation will stay, then I think it makes sense to just add something to the documentation to explain why the limitation exists. What do you think?
I suggest a new type as the type derivation equations would also have to be defined (I'm not sure they would only change the the max precision, they may also change the magic number "6". And I'd guess that they may be different for different systems. (For example, I think Oracle supports unlimited precision.)