substrait icon indicating copy to clipboard operation
substrait copied to clipboard

Integer overflow, floating point rounding options, and domain error handling options are undocumented

Open jvanstraten opened this issue 2 years ago • 1 comments

This came up in yesterday's sync meeting: these options are copypasted all over the place but are not really documented anywhere. For one thing, I had misinterpreted the SILENT option. The consensus during the meeting was that we should just repeat the docs all over the YAMLs, but I don't think we reached consensus about how to do that ergonomically (manual labor, YAML references, or some script that generates the YAMLs from a higher-level description).

For the record, the lacking documentation should provide roughly this information:

  • Integer overflow modes:

    • SILENT: in case of overflow, the returned value is unspecified (i.e. it may return any value, including values outside the set of values Substrait defines for the integer type if a wider physical representation is used to implement it), and the consumer should not throw an error because of it.
    • SATURATE: in case of overflow, the correct value must be computed and must then be clamped to the nearest value that the type must support according to Substrait. For example: 100i8 + 100i8 = 200, the closest value to which in [-128..127]i8 is 127i8.
    • ERROR: in case of overflow, the consumer must notify the user by throwing an error.
  • Floating point rounding modes:

    • TIE_TO_EVEN: round to the nearest value that can be represented by the floating point type. If the computed value is exactly midway between the two nearest values, choose the one for which the LSB of the mantissa is zero.
    • TIE_AWAY_FROM_ZERO: as above, but tie away from zero.
    • TRUNCATE: when the exact result cannot be represented by the floating point type, choose the next available representation in the direction of zero.
    • CEILING: as TRUNCATE, but in the direction of positive infinity.
    • FLOOR: as TRUNCATE, but in the direction of negative infinity.
  • Domain error modes:

    • NAN: yield a quiet NaN as per IEEE 754.
    • ERROR: the consumer must notify the user by throwing an error.
    • we could consider adding a SILENT to this list, where the consumer is free to yield any value. I would prioritize it between NAN and ERROR.

In more or less the same vein, I don't think it's clearly documented anywhere that not setting an optional enum argument means that the consumer is free to choose any of the provided behaviors; I thought it had to choose the first one.

jvanstraten avatar Jul 21 '22 09:07 jvanstraten