coral icon indicating copy to clipboard operation
coral copied to clipboard

Translation of NULL column from Hive to Trino

Open weiatwork opened this issue 3 years ago • 5 comments

The Hive view is defined as CREATE VIEW v1 AS SELECT NULL AS c1 FROM t

When using it in Trino it throws an error Unknown type 'NULL' for column 'c1' in view: hive.default.v1

It looks like Hive has the type void for NULL, while Trino doesn't and still enforces a type for every column.

A workaround is to modify the Hive view definition by adding an explicit cast (to whatever valid type in both Hive and Trino) CREATE VIEW v1 AS SELECT CAST(NULL AS INT) AS c1 FROM t This will be accepted by Trino and still works in Hive.

Regarding the translation from Hive to Trino, does it make sense to convert a plain NULL column into a CAST?

weiatwork avatar Sep 09 '21 16:09 weiatwork

FWIW Trino has concept of an unknown type.

trino> SELECT typeof(NULL);
  _col0
---------
 unknown

However, view column is not allowed to be of type unknown:

trino:default> CREATE VIEW v AS SELECT NULL x;
Query 20210909_193812_00006_tzggx failed: line 1:1: Column type is unknown: x

findepi avatar Sep 09 '21 19:09 findepi

You're right. It looks like Trino is more strict on this: it requires the type to be present even for a NULL column in the view.

Do you think if it's possible to expose that unknown type in Trino and make it similar to void in Hive? That will make Coral's translation more targeted. Otherwise I don't know if it's a good idea to let Coral just translate it to a CAST to an arbitrary type.

weiatwork avatar Sep 09 '21 20:09 weiatwork

Do you think if it's possible to expose that unknown type in Trino and make it similar to void in Hive?

the unknown type has intentionally limited applicability. it should be considered an implementation detail of the engine and not something connectors can use. @martint can comment more

casting to varchar seems like the only viable option, unless we (continue to) reject such views. any other options?

findepi avatar Sep 10 '21 10:09 findepi

Spark added void support in https://github.com/apache/spark/pull/28833.

wmoustafa avatar Oct 05 '21 16:10 wmoustafa

casting to varchar seems like the only viable option, unless we (continue to) reject such views.

FWIW, when you create a table in DB2 or PostgreSQL with a column derived from a NULL value, you end up with a VARCHAR/TEXT column, so it's not entirely unreasonable to use VARCHAR for translating those types of expressions.

martint avatar Nov 08 '21 21:11 martint