trino
trino copied to clipboard
Add support for the custom hive catalog to support Hive3
Description
Currently Trino is constrained to utilize the default hive catalog (referred to as "hive"
) when interfacing with the hive metastore. Consequently, users are confined to employing only a two-level hierarchy (schema.table
) within a Trino catalog/connector. In contrast, Hive thrift allows for custom hive catalog names as the parent of the schema, enabling the utilization of a three-level hierarchy such as hive-catalog.schema.table
.
Hive Thrift API supports custom catalog since Hive3 Therefore, at high level, this PR has the following changes
- Customize the hive catalog name within Trino's catalog settings and transmit it via the factories to the Thrift client -->
StaticMetastoreConfig
,StaticTokenAwareHttpMetastoreClientFactory
,StaticTokenAwareMetastoreClientFactory
,ThriftMetastoreClientFactory
, 'DefaultThriftMetastoreClientFactory',HttpThriftMetastoreClientFactory
- Thrift client to support custom catalog:
ThriftHiveMetastoreClient
- There will be 1:1 mapping between Trino catalog and hive catalog.
-
"hive"
will be the default catalog name.
Thanks to @dain @electrum @hashhar @findinpath @anusudarsan @samssh @mosabua
Additional context and related issues
Above approach was discussed at a few places, so adopted in the PR as well. one two
@samssh please let us know, if anything is missing or need to take care of. Fixes https://github.com/trinodb/trino/issues/10287
co-author : @samssh, he has worked on the similar change.
Release notes
(x) Release notes are required, with the following suggested text:
# Hive
* Add support for specifying catalog name in Thrift metastore. (`#10287`)
cc @samssh
@osscm now that you have a "ready to review" PR could you please update the description to contain the business case & concise notes including the implementation strategy contained in the code of this PR?
@electrum can you enable test with secrets in this PR?
@osscm now that you have a "ready to review" PR could you please update the description to contain the business case & concise notes including the implementation strategy contained in the code of this PR?
thanks @findinpath added details, please see if this make sense.
cc @anusudarsan
/test-with-secrets sha=e7556cb2cb2db99b03236409fcc1d1186badf430
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/8748163326
@osscm re: tests can we extend TestHiveMetastoreMetadataQueriesAccessOperations
and have the new test set a custom catalog name ? To create the custom catalogs/namespaces in hive, it doesnt look like Hive yet supports creating catalogs via beeline CLI or SQL. We could update the metastore directly
hiveHadoop.runOnMetastore(INSERT INTO CTLGS VALUES (2, 'test_custom_catalog', 'test catalog', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog')");
I guess you will also need to create a default
schema in test_catalog
using either CREATE DATABASE
if hive supports it for custom catalog or directly modifying the metastore db like
hiveHadoop.runOnMetastore("INSERT INTO DBS VALUES (4, 'test default schema', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog/default.db', 'default', 'hive', 'USER', 'test_custom_catalog')");
let me know if you hit any issues with this or need help.
@findinpath @electrum is extending TestHiveMetastoreMetadataQueriesAccessOperations
good enough for this functionality, or do we need BCT as well?
fyi, will work on the pending comments today.
@osscm re: tests can we extend
TestHiveMetastoreMetadataQueriesAccessOperations
and have the new test set a custom catalog name ? To create the custom catalogs/namespaces in hive, it doesnt look like Hive yet supports creating catalogs via beeline CLI or SQL. We could update the metastore directlyhiveHadoop.runOnMetastore(INSERT INTO CTLGS VALUES (2, 'test_custom_catalog', 'test catalog', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog')");
I guess you will also need to create a
default
schema intest_catalog
using eitherCREATE DATABASE
if hive supports it for custom catalog or directly modifying the metastore db likehiveHadoop.runOnMetastore("INSERT INTO DBS VALUES (4, 'test default schema', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog/default.db', 'default', 'hive', 'USER', 'test_custom_catalog')");
let me know if you hit any issues with this or need help.
@findinpath @electrum is extending
TestHiveMetastoreMetadataQueriesAccessOperations
good enough for this functionality, or do we need BCT as well?
Thanks @anusudarsan for the pointers!
yes, we need to run tests with the default ("hive") and custom catalog as well. will try these steps.
and as you have requested to @electrum @findinpath, will wait for their direction as well.
There seems to be a dependency issue on https://github.com/trinodb/trino/actions/runs/8812970406/job/24190468379?pr=21502
Error: ] Some problems were encountered while processing the POMs:
Error: 'dependencies.dependency.version' for com.google.code.findbugs:jsr305:jar must be a valid version but is '${dep.findbugs.version}'. @ line 58, column 22
There seems to be a dependency issue on https://github.com/trinodb/trino/actions/runs/8812970406/job/24190468379?pr=21502
Error: ] Some problems were encountered while processing the POMs: Error: 'dependencies.dependency.version' for com.google.code.findbugs:jsr305:jar must be a valid version but is '${dep.findbugs.version}'. @ line 58, column 22
this dependency I have removed in the last commit : ref: https://github.com/osscm/trino/blob/hive3thrift/plugin/trino-hive/pom.xml not sure if this is the old prb.
I see the latest prb has a failure in delta-lake tests, which seems to be
Error: TestDeltaLakeLocalConcurrentWritesTest.lambda$testConcurrentInsertsSelectingFromTheSameVersionedTable$9:230 » QueryFailed Failed to write Delta Lake transaction log entry
Also address #21502 (comment) to extend TestHiveMetastoreMetadataQueriesAccessOperations please
sure, will update for this one.
Also address #21502 (comment) to extend TestHiveMetastoreMetadataQueriesAccessOperations please
sure, will update for this one.
@anusudarsan
ref: https://github.com/trinodb/trino/pull/21502#discussion_r1575427996
we have now a Smoke Test cases and TestHiveMetastoreCatalogs
(thanks @findepi for adding the test-cases! )
do you think, these test-cases are good or we need to more for TestHiveMetastoreMetadataQueriesAccessOperations
btw Marius also found out, testing custom catalog with HDFS
had some issues, so eventually it worked with minio
FS.
we have now a Smoke Test cases and TestHiveMetastoreCatalogs (thanks @findepi for adding the test-cases! )
@osscm I only see one test testShowSchemas
in TestHiveMetastoreCatalogs, are you planning to add more? That said, Im ok with having the smoketest.
/test-with-secrets sha=357608db032cdfae5f67ca74a29ede01d6a2d1dd
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8920100592
hi @dain if you get time, can you please review it.
took care of the feedback from @anusudarsan @findinpath
thanks to @anusudarsan @findinpath for continuous feedback and support!
/test-with-secrets sha=cf153475e20161039b1e99815c414b2bf2bfafb5
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8977088500
sorry, got engaged in other work, will try to take care of remaining comments in next a few days.
@osscm I pushed some changes instead of you.
@osscm I pushed some changes instead of you.
Thanks a lot @ebyhr!
@osscm Why did you revert my change?
@osscm Why did you revert my change?
did I revert? Apologies! maybe by mistake while squashing in local.
@osscm I restored to the status before you reverted my change. Please add a fixup commit for additional changes.
@osscm I restored to the status you reverted my change. Please add a fixup commit for additional changes.
@osscm I restored to the status you reverted my change. Please add a fixup commit for additional changes.
Thanks @ebyhr!
I think, already it has all the changes in! so dont need to do any changes.
two test cases are failing, checking why
- test-jdbc failed --> seems to be transient?
com.github.dockerjava.api.exception.NotFoundException: Status 404: {"message":"manifest for trinodb/trino:448 not found: manifest unknown: manifest unknown"}
- pt (default, suite-7-non-generic, )
this one I am not sure, was not able to run in local, we can try to run build again?
2024-05-16 10:09:50 INFO: FAILURE / io.trino.tests.product.cli.TestTrinoCli.shouldPrintExplainAnalyzePlan (Groups: cli) took 3.7 seconds
2024-05-16T04:24:50.1411172Z tests | 2024-05-16 10:09:50 SEVERE: Failure cause:
2024-05-16T04:24:50.1412046Z tests | java.lang.AssertionError:
2024-05-16T04:24:50.1412723Z tests | Expecting ArrayList:
2024-05-16T04:24:50.1413290Z tests | ["",
2024-05-16T04:24:50.1414102Z tests | "Query 20240516_042448_00042_raqe3 [RUNNING] i[0 0B 0B] o[0 0B 0B] splits[0/0/0]",
2024-05-16T04:24:50.1415007Z tests | "",
2024-05-16T04:24:50.1415937Z tests | "Query 20240516_042448_00042_raqe3 [FAILED] i[35 3.31K 2.05K] o[35 3.31K 2.05K] splits[4/8/4]",
2024-05-16T04:24:50.1417109Z tests | "Query 20240516_042448_00042_raqe3, FAILED, 1 node",
2024-05-16T04:24:50.1417922Z tests | "Splits: 16 total, 4 done (25.00%)",
2024-05-16T04:24:50.1418669Z tests | "1.58 [35 rows, 3.31KB] [22 rows/s, 2.09KB/s]",
2024-05-16T04:24:50.1419308Z tests | "",
2024-05-16T04:24:50.1420095Z tests | "Query 20240516_042448_00042_raqe3 failed: Error committing write parquet to Hive"]
2024-05-16T04:24:50.1421074Z tests | to contain:
2024-05-16T04:24:50.1421690Z tests | ["CREATE TABLE", "Query Plan"]
2024-05-16T04:24:50.1422472Z tests | but could not find the following element(s):
2024-05-16T04:24:50.1423281Z tests | ["CREATE TABLE", "Query Plan"]
2024-05-16T04:24:50.1423913Z tests |
2024-05-16T04:24:50.1425259Z tests | at io.trino.tests.product.cli.TestTrinoCli.shouldPrintExplainAnalyzePlan(TestTrinoCli.java:416)
2024-05-16T04:24:50.1427234Z tests | at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
2024-05-16T04:24:50.1428777Z tests | at java.base/java.lang.reflect.Method.invoke(Method.java:580)
2024-05-16T04:24:50.1430220Z tests | at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
2024-05-16T04:24:50.1432046Z tests | at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
2024-05-16T04:24:50.1433710Z tests | at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
2024-05-16T04:24:50.1435238Z tests | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
2024-05-16T04:24:50.1479952Z tests | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
2024-05-16T04:24:50.1481452Z tests | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2024-05-16T04:24:50.1483137Z tests | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
@osscm the CI looks good now
@osscm the CI looks good now
@anusudarsan, so good to merge now? :)
@ebyhr or @dain if you can please help to merge, thanks!