MonetDB icon indicating copy to clipboard operation
MonetDB copied to clipboard

Simple numerical expression of decimal type column fails with error

Open smavros opened this issue 1 year ago • 0 comments

Describe the bug

Create a table with a single decimal type column and some data just below the decimal type limit

CREATE TABLE IF NOT EXISTS foo (n dec(4,0));
INSERT INTO foo values (123);  -- value 123 corresponds to dec(3,0) type

Note: the choice of the decimal type in the example is NOT coincidental. dec(4,0) is backed by GDK type sht that corresponds to 2 bytes of storage. dec(5,0) is backed by GDK type int that corresponds to 4 bytes of storage. We need to be in a decimal type just before the underlying type upcasting e.g. dec(2, 0), dec(4, 0), dec(9, 0), dec(18, 0) since the dec(N+1, 0) type for those cases uses a larger underlying type! This will become very important later.

Then the projection with a simple numerical expression of the column

SELECT (((n + 1) + 1) + n) FROM foo;

raises an arcane error

Error in optimizer multiplex: 
    TypeException:user.main[13]:'bat.append' undefined in:
        X_107:bat[:sht] := bat.append(X_107:bat[:sht], X_115:int);

in the mserver5 log we get

#2024-11-20 21:37:04: client0014: OPTexpandMultiplex: WARNING:
    To speedup calc.+ a bulk operator implementation is needed for
        X_19:bat[:sht] := mal.multiplex("calc":str, "+":str, X_15:bat[:sht], X_16:bat[:int]); 

upon investigating (and also based on the server log) we see that multiplex operator acts upon the instruction

X_19:bat[:sht] := batcalc.+(X_15:bat[:sht], X_16:bat[:int], nil:bat[:any], nil:bat[:any])

this is a problematic instruction if you pay attention to the types of the input and output.

It seems that we are trying to add batcalc.+() data of type sht and data of type int and store the result in type sht! There are two problems:

  1. clearly the result might not fit in the output type
  2. there is no MAL batcalc.+() instruction that corresponds to those types (since it is not making any sense)

Upon further investigation the reason for this instruction is a play between the type system improvements in Aug2024 and the relational optimizer rel_simplify_math

Type system improvements

With the type system improvements each node of an expression tree must have a corresponding type. In case that the node is an operation between two types Tl and Tr then result of the operation must be assigned in a type that fits the data. In our case the expression (((n + 1) + 1) + n) is represented by the tree

            +
          /   \
        +       n
      /   \
    +       1
  /   \
n      1

we can annotate the type of each node starting from the bottom

                   +
               /       \
             + :dec(5)   e_convert :dec(3)@sht -> dec(3)@int
         /       \              \
      + :dec(4)    1 :dec(1)     n :dec(3)
  /       \
n :dec(3)   1 :dec(1)

Please note that during the types resolution an e_convert node is introduced to the right leaf node of the root node. This is done because the left node + :dec(5) needs an int type so the e_convert makes sure that the right side of the root + has the same underlying type, namely int. Also note that the SQL type remains a dec(3) but the e_convert makes sure that the underlying type is int and not sht.

rel_simplify_math optimizer

The role of this optimizer is to simplify expressions. The case that is relevant for our investigation is expressions like ((n + 1) + m) where n and m are column expressions. Of course the columns can be the same one, as in our case.

The effect of the optimizer in this tree expression is to simply bring the column expression to the lower leaf nodes of the tree e.g. ((n + 1) + m) ===> ((n + m) + 1). Or graphically:

       +                  +
     /   \              /   \
    +     m   ===>    +      1
  /   \             /   \
 n     1           n     m

Combination

When the last two mechanisms are combined the effect in our expression is interesting. What we get is

                   +
               /       \
             + :dec(5)  1 :dec(1)
         /       \ 
      + :dec(4)    1 :dec(1) 
  /       \
n :dec(3)   e_convert :dec(3)@sht -> dec(3)@int
             \
              n :dec(3)

observe how the e_convert branch was pushed down from right of the root node to right of the innermost + node because of the effect of rel_simplify_math optimizer

When the subtree of the innermost + node is going be converted into a MAL instruction we gonna have a type problem since

  • the right node e_convert has an underlying type of int
  • the left node n has a type of sht
  • their parent node + (which holds the result of the addition) has type sht (since it is dec(4))

Hence we generate the instruction

X_19:bat[:sht] := batcalc.+(X_15:bat[:sht], X_16:bat[:int], ...)

Reproduction steps

CREATE TABLE IF NOT EXISTS foo (n dec(4,0));
INSERT INTO foo values (123);
SELECT (((n + 1) + 1) + n) FROM foo;

the query gives

Error in optimizer multiplex: 
    TypeException:user.main[13]:'bat.append' undefined in:
        X_107:bat[:sht] := bat.append(X_107:bat[:sht], X_115:int);

Expected behavior

the query must succeed with the output

+---------+
| %1      |
+=========+
|     248 |
+---------+
1 tuple

MonetDB release

Aug2024

Executables source

Self compiled and installed

Operating System

Fedora Linux 40

Additional context

No response

smavros avatar Nov 21 '24 13:11 smavros