coral icon indicating copy to clipboard operation
coral copied to clipboard

(Release 1.0.25) Hive->Presto query translation fails on struct field access

Open ivmarkov opened this issue 4 years ago • 6 comments

Translation of struct<> field references to Presto fails with 1.0.25 (released 4 days ago).

Sample:

    public static void main(String[] args) throws Exception {
        Map<String, Map<String, List<String>>> localMetaStore = new HashMap<>();

        Map<String, List<String>> tables = new HashMap<>();
        tables.put("bar", Arrays.asList("col|struct<i:int,s:string>"));

        localMetaStore.put("foo", tables);

        HiveToRelConverter converter = HiveToRelConverter.create(localMetaStore);
        RelToPrestoConverter presto = new RelToPrestoConverter();

        RelNode rel = converter.convertSql("select col from foo.bar");
        System.out.println(presto.convert(rel));

        // Referencing 'i' in struct 'col' fails
        rel = converter.convertSql("select col.i from foo.bar");
        System.out.println(presto.convert(rel));
    }

Stacktrace:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
SELECT "col"
FROM "foo"."bar"
Exception in thread "main" org.apache.calcite.runtime.CalciteContextException: At line 0, column 0: Table 'col' not found
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
        at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:834)
        at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:819)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:4867)
        at org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:370)
        at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5759)
        at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:5744)
        at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:5351)
        at com.linkedin.coral.hive.hive2rel.HiveSqlValidator.expand(HiveSqlValidator.java:61)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:447)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelectList(SqlValidatorImpl.java:4104)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3392)
        at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
        at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1005)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:965)
        at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:940)
        at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:647)
        at com.linkedin.coral.hive.hive2rel.HiveSqlToRelConverter.convertQuery(HiveSqlToRelConverter.java:57)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.toRel(HiveToRelConverter.java:126)
        at com.linkedin.coral.hive.hive2rel.HiveToRelConverter.convertSql(HiveToRelConverter.java:81)
        at com.vmware.cb.product.metrics.Main.main(Main.java:51)
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Table 'col' not found
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463)
        at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:572)
        ... 23 more

ivmarkov avatar Feb 10 '21 14:02 ivmarkov

@ivmarkov does it fail only when using the local Metastore or is this just a way to reproduce the issue and the impact actually takes place with a regular Metastore instance, or when integrated to Presto?

wmoustafa avatar Feb 11 '21 02:02 wmoustafa

Just tried going back up to version 1.0.9 and this can still be reproduced. It may be an issue with the local Metastore implementation. Have you seen this working in a previous version?

wmoustafa avatar Feb 11 '21 02:02 wmoustafa

Haven't tried yet with the regular metastore. Might try early next week.

However, I'm not holding my breath that it will work with the regular one. A lot of the code seems to be shared between the regular metastore and the local one - particularly in the area of parsing type information. And regardless, would be good to have a feature parity between the two anyway, especially given that the local metastore is such a nice option for using in tests, mocks and experiments.

ivmarkov avatar Feb 11 '21 14:02 ivmarkov

OK so I've spent a couple of hours and I think I found a bit more on the issue.

In short, the problem has nothing to do with using the "local Metastore", because the issue is here, on this line 139, in TypeConverter.java. A class which is shared by the "local" and regular metastores and essentially parses & builds type info for columns in the metastores' tables.

By calling dtFactory.createStructType(fTypes, structType.getAllStructFieldNames()), you are essentially instructing the Calcite SQL Validator to NOT accept any references to fields in structural columns, which are not prefixed with the table name itself. However, this is Apache Phoenix specific, and not really how Hive works. Hive, Spark, Flink and as a matter of fact - Presto too - all happily accept references to col.i in table bar in addition to bar.col.i.

If you replace the above call with dtFactory.createStructType(StructKind.PEEK_FIELDS_NO_EXPAND, fTypes, structType.getAllStructFieldNames()), the above syntax will work.

This comment on line 19 contains more details why we probably shouldn't be using StructKind.FULLY_QUALIFIED (which is what TypeConverter.java currently does), but one of the other types.

Now, is the above one-liner fix the final, good fix? I'm not 100% sure as of yet:

  • A side effect from it is that this would also work, which I'm not sure is allowed in Hive (maybe we need to check?): select i from foo.bar

Also, I have not run your test suite with the fix applied yet.

ivmarkov avatar Mar 01 '21 10:03 ivmarkov

Any feedback?

ivmarkov avatar Mar 11 '21 15:03 ivmarkov

@ljfgem @funcheetah FYI.

wmoustafa avatar Aug 06 '21 02:08 wmoustafa