coral
coral copied to clipboard
Translation of NULL column from Hive to Trino
The Hive view is defined as
CREATE VIEW v1 AS SELECT NULL AS c1 FROM t
When using it in Trino it throws an error
Unknown type 'NULL' for column 'c1' in view: hive.default.v1
It looks like Hive has the type void
for NULL, while Trino doesn't and still enforces a type for every column.
A workaround is to modify the Hive view definition by adding an explicit cast (to whatever valid type in both Hive and Trino)
CREATE VIEW v1 AS SELECT CAST(NULL AS INT) AS c1 FROM t
This will be accepted by Trino and still works in Hive.
Regarding the translation from Hive to Trino, does it make sense to convert a plain NULL column into a CAST?
FWIW Trino has concept of an unknown
type.
trino> SELECT typeof(NULL);
_col0
---------
unknown
However, view column is not allowed to be of type unknown
:
trino:default> CREATE VIEW v AS SELECT NULL x;
Query 20210909_193812_00006_tzggx failed: line 1:1: Column type is unknown: x
You're right. It looks like Trino is more strict on this: it requires the type to be present even for a NULL column in the view.
Do you think if it's possible to expose that unknown
type in Trino and make it similar to void
in Hive? That will make Coral's translation more targeted. Otherwise I don't know if it's a good idea to let Coral just translate it to a CAST to an arbitrary type.
Do you think if it's possible to expose that unknown type in Trino and make it similar to void in Hive?
the unknown
type has intentionally limited applicability. it should be considered an implementation detail of the engine and not something connectors can use.
@martint can comment more
casting to varchar
seems like the only viable option, unless we (continue to) reject such views.
any other options?
Spark added void support in https://github.com/apache/spark/pull/28833.
casting to varchar seems like the only viable option, unless we (continue to) reject such views.
FWIW, when you create a table in DB2 or PostgreSQL with a column derived from a NULL value, you end up with a VARCHAR/TEXT column, so it's not entirely unreasonable to use VARCHAR for translating those types of expressions.