Python UDF: Unsupported scalar type 13
Describe the bug Python UDFs don't accept scalar date parameters
To Reproduce
This function expects a date parameter and simply returns the python type of the parameter passed:
CREATE OR REPLACE FUNCTION test_date(d1 date) RETURNS string
LANGUAGE PYTHON {
return(f"d1: {str(type(d1))}")
};
It fails when a scalar date is passed:
sql>select test_date(curdate());
Unsupported scalar type 13.
It succeeds when a column of dates is passed:
sql>select test_date(d) from (values (curdate()),(curdate())) as t(d);
+-----------------------------+
| %4 |
+=============================+
| d1: <class 'numpy.ndarray'> |
+-----------------------------+
1 tuple
Type int, for comparison, works as expected in both cases.
Software versions
- MonetDB version number 11.43.14
- OS and version: Fedora 35
- self-installed and compiled
Indeed the unsupported type error (date?) seems weird. But to me the second result is just a weird, I'd expected 2x the error or 2x type date).
If the second result is what you needed, it should have been an aggregate function (even that definition seems out of place here).
@njnes you are right about that, I didn't even notice. I came to that function while debugging. But the same happens with a proper function. The simplest:
CREATE OR REPLACE FUNCTION test_date(d1 date) RETURNS date
LANGUAGE PYTHON {
return(d1)
};
Column:
sql>select test_date(d) from (values (curdate()),(curdate())) as t(d);
+------------+
| %4 |
+============+
| 2022-04-28 |
| 2022-04-28 |
+------------+
2 tuples
Value:
sql>select test_date(curdate());
Unsupported scalar type 13.
The way date types are handled is weird. In some parts they are converted to strings. I see they are missing in the scalar conversion, so convert to strings there? I am not sure, this code is not mine.
I will take a look to see what SQL types are missing and how handling can be improved in the python API
I've made some progress on this issue, but it is not yet ready. It requires more work than I expected because the date representation is different between MonetDB and numpy (which is not the case for numeric types). This means that a conversion is needed both during input (MonetDB->numpy) and during output (numpy->MonetDB). This also means that it will be a slower than numeric types. As @PedroTadim said when there are multiple values, dates are first converted to strings. I will probably change that as well after this fix hoping to improve performance.
@njnes
Indeed the unsupported type error (date?) seems weird. But to me the second result is just a weird, I'd expected 2x the error or 2x type date).
This happens because the internal representation is a numpy array and that has only one type for all the values, so it is a kind of an implicit aggregation. I am not sure if there is an easy fix for this.
Yes for the python function get type is returning a single value. But we should enforce #rows in = # rows out for functions (scalar or map like). Else the user should define/create a aggregate function.
An update on this issue: I pushed fixes for the SQL types date, time and timestamp. One breaking change of this fix is that these types are no longer converted implicitly to strings as was the case until now. This means that the following
CREATE OR REPLACE FUNCTION test_date(d1 date) RETURNS STRING
LANGUAGE PYTHON {
return(d1)
};
that was allowed until now will produce an error when the function is called.
I still need to do the same thing for UUID and interval types at least.
The issue that @njnes noticed (wrong number of return values) has not been fixed in this set of patches.