trino icon indicating copy to clipboard operation
trino copied to clipboard

Add support for the custom hive catalog to support Hive3

Open osscm opened this issue 10 months ago • 28 comments

Description

Currently Trino is constrained to utilize the default hive catalog (referred to as "hive") when interfacing with the hive metastore. Consequently, users are confined to employing only a two-level hierarchy (schema.table) within a Trino catalog/connector. In contrast, Hive thrift allows for custom hive catalog names as the parent of the schema, enabling the utilization of a three-level hierarchy such as hive-catalog.schema.table.

Hive Thrift API supports custom catalog since Hive3 Therefore, at high level, this PR has the following changes

  1. Customize the hive catalog name within Trino's catalog settings and transmit it via the factories to the Thrift client --> StaticMetastoreConfig , StaticTokenAwareHttpMetastoreClientFactory , StaticTokenAwareMetastoreClientFactory , ThriftMetastoreClientFactory, 'DefaultThriftMetastoreClientFactory', HttpThriftMetastoreClientFactory
  2. Thrift client to support custom catalog: ThriftHiveMetastoreClient
  3. There will be 1:1 mapping between Trino catalog and hive catalog.
  4. "hive" will be the default catalog name.

Thanks to @dain @electrum @hashhar @findinpath @anusudarsan @samssh @mosabua

Additional context and related issues

Above approach was discussed at a few places, so adopted in the PR as well. one two

@samssh please let us know, if anything is missing or need to take care of. Fixes https://github.com/trinodb/trino/issues/10287

co-author : @samssh, he has worked on the similar change.

Release notes

(x) Release notes are required, with the following suggested text:

# Hive
* Add support for specifying catalog name in Thrift metastore. (`#10287`)

osscm avatar Apr 11 '24 00:04 osscm

cc @samssh

osscm avatar Apr 17 '24 16:04 osscm

@osscm now that you have a "ready to review" PR could you please update the description to contain the business case & concise notes including the implementation strategy contained in the code of this PR?

findinpath avatar Apr 18 '24 11:04 findinpath

@electrum can you enable test with secrets in this PR?

anusudarsan avatar Apr 18 '24 21:04 anusudarsan

@osscm now that you have a "ready to review" PR could you please update the description to contain the business case & concise notes including the implementation strategy contained in the code of this PR?

thanks @findinpath added details, please see if this make sense.

cc @anusudarsan

osscm avatar Apr 19 '24 02:04 osscm

/test-with-secrets sha=e7556cb2cb2db99b03236409fcc1d1186badf430

ebyhr avatar Apr 19 '24 03:04 ebyhr

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/8748163326

github-actions[bot] avatar Apr 19 '24 03:04 github-actions[bot]

@osscm re: tests can we extend TestHiveMetastoreMetadataQueriesAccessOperations and have the new test set a custom catalog name ? To create the custom catalogs/namespaces in hive, it doesnt look like Hive yet supports creating catalogs via beeline CLI or SQL. We could update the metastore directly

hiveHadoop.runOnMetastore(INSERT INTO CTLGS VALUES (2, 'test_custom_catalog', 'test catalog', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog')");

I guess you will also need to create a default schema in test_catalog using either CREATE DATABASE if hive supports it for custom catalog or directly modifying the metastore db like

hiveHadoop.runOnMetastore("INSERT INTO DBS VALUES (4, 'test default schema', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog/default.db', 'default', 'hive', 'USER', 'test_custom_catalog')");

let me know if you hit any issues with this or need help.

@findinpath @electrum is extending TestHiveMetastoreMetadataQueriesAccessOperations good enough for this functionality, or do we need BCT as well?

anusudarsan avatar Apr 19 '24 17:04 anusudarsan

fyi, will work on the pending comments today.

osscm avatar Apr 19 '24 19:04 osscm

@osscm re: tests can we extend TestHiveMetastoreMetadataQueriesAccessOperations and have the new test set a custom catalog name ? To create the custom catalogs/namespaces in hive, it doesnt look like Hive yet supports creating catalogs via beeline CLI or SQL. We could update the metastore directly

hiveHadoop.runOnMetastore(INSERT INTO CTLGS VALUES (2, 'test_custom_catalog', 'test catalog', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog')");

I guess you will also need to create a default schema in test_catalog using either CREATE DATABASE if hive supports it for custom catalog or directly modifying the metastore db like

hiveHadoop.runOnMetastore("INSERT INTO DBS VALUES (4, 'test default schema', 'hdfs://hadoop-master:9000/user/hive/warehouse/test_custom_catalog/default.db', 'default', 'hive', 'USER', 'test_custom_catalog')");

let me know if you hit any issues with this or need help.

@findinpath @electrum is extending TestHiveMetastoreMetadataQueriesAccessOperations good enough for this functionality, or do we need BCT as well?

Thanks @anusudarsan for the pointers!

yes, we need to run tests with the default ("hive") and custom catalog as well. will try these steps.

and as you have requested to @electrum @findinpath, will wait for their direction as well.

osscm avatar Apr 21 '24 19:04 osscm

There seems to be a dependency issue on https://github.com/trinodb/trino/actions/runs/8812970406/job/24190468379?pr=21502

Error: ] Some problems were encountered while processing the POMs:
Error:  'dependencies.dependency.version' for com.google.code.findbugs:jsr305:jar must be a valid version but is '${dep.findbugs.version}'. @ line 58, column 22

findinpath avatar Apr 25 '24 07:04 findinpath

There seems to be a dependency issue on https://github.com/trinodb/trino/actions/runs/8812970406/job/24190468379?pr=21502

Error: ] Some problems were encountered while processing the POMs:
Error:  'dependencies.dependency.version' for com.google.code.findbugs:jsr305:jar must be a valid version but is '${dep.findbugs.version}'. @ line 58, column 22

this dependency I have removed in the last commit : ref: https://github.com/osscm/trino/blob/hive3thrift/plugin/trino-hive/pom.xml not sure if this is the old prb.

I see the latest prb has a failure in delta-lake tests, which seems to be

Error:    TestDeltaLakeLocalConcurrentWritesTest.lambda$testConcurrentInsertsSelectingFromTheSameVersionedTable$9:230 » QueryFailed Failed to write Delta Lake transaction log entry

osscm avatar Apr 25 '24 17:04 osscm

Also address #21502 (comment) to extend TestHiveMetastoreMetadataQueriesAccessOperations please

sure, will update for this one.

osscm avatar Apr 26 '24 08:04 osscm

Also address #21502 (comment) to extend TestHiveMetastoreMetadataQueriesAccessOperations please

sure, will update for this one.

@anusudarsan ref: https://github.com/trinodb/trino/pull/21502#discussion_r1575427996 we have now a Smoke Test cases and TestHiveMetastoreCatalogs (thanks @findepi for adding the test-cases! )

do you think, these test-cases are good or we need to more for TestHiveMetastoreMetadataQueriesAccessOperations btw Marius also found out, testing custom catalog with HDFS had some issues, so eventually it worked with minio FS.

osscm avatar Apr 27 '24 20:04 osscm

we have now a Smoke Test cases and TestHiveMetastoreCatalogs (thanks @findepi for adding the test-cases! )

@osscm I only see one test testShowSchemas in TestHiveMetastoreCatalogs, are you planning to add more? That said, Im ok with having the smoketest.

anusudarsan avatar Apr 29 '24 21:04 anusudarsan

/test-with-secrets sha=357608db032cdfae5f67ca74a29ede01d6a2d1dd

ebyhr avatar May 02 '24 07:05 ebyhr

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8920100592

github-actions[bot] avatar May 02 '24 07:05 github-actions[bot]

hi @dain if you get time, can you please review it.

took care of the feedback from @anusudarsan @findinpath

thanks to @anusudarsan @findinpath for continuous feedback and support!

osscm avatar May 03 '24 17:05 osscm

/test-with-secrets sha=cf153475e20161039b1e99815c414b2bf2bfafb5

ebyhr avatar May 06 '24 23:05 ebyhr

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8977088500

github-actions[bot] avatar May 06 '24 23:05 github-actions[bot]

sorry, got engaged in other work, will try to take care of remaining comments in next a few days.

osscm avatar May 13 '24 08:05 osscm

@osscm I pushed some changes instead of you.

ebyhr avatar May 13 '24 09:05 ebyhr

@osscm I pushed some changes instead of you.

Thanks a lot @ebyhr!

osscm avatar May 14 '24 05:05 osscm

@osscm Why did you revert my change?

ebyhr avatar May 15 '24 02:05 ebyhr

@osscm Why did you revert my change?

did I revert? Apologies! maybe by mistake while squashing in local.

osscm avatar May 15 '24 04:05 osscm

@osscm I restored to the status before you reverted my change. Please add a fixup commit for additional changes.

ebyhr avatar May 16 '24 04:05 ebyhr

@osscm I restored to the status you reverted my change. Please add a fixup commit for additional changes.

@osscm I restored to the status you reverted my change. Please add a fixup commit for additional changes.

Thanks @ebyhr!

I think, already it has all the changes in! so dont need to do any changes.

two test cases are failing, checking why

  1. test-jdbc failed --> seems to be transient?
com.github.dockerjava.api.exception.NotFoundException: Status 404: {"message":"manifest for trinodb/trino:448 not found: manifest unknown: manifest unknown"}
  1. pt (default, suite-7-non-generic, )

this one I am not sure, was not able to run in local, we can try to run build again?

2024-05-16 10:09:50 INFO: FAILURE     /    io.trino.tests.product.cli.TestTrinoCli.shouldPrintExplainAnalyzePlan (Groups: cli) took 3.7 seconds
2024-05-16T04:24:50.1411172Z tests               | 2024-05-16 10:09:50 SEVERE: Failure cause:
2024-05-16T04:24:50.1412046Z tests               | java.lang.AssertionError: 
2024-05-16T04:24:50.1412723Z tests               | Expecting ArrayList:
2024-05-16T04:24:50.1413290Z tests               |   ["",
2024-05-16T04:24:50.1414102Z tests               |     "Query 20240516_042448_00042_raqe3 [RUNNING] i[0 0B 0B] o[0 0B 0B] splits[0/0/0]",
2024-05-16T04:24:50.1415007Z tests               |     "",
2024-05-16T04:24:50.1415937Z tests               |     "Query 20240516_042448_00042_raqe3 [FAILED] i[35 3.31K 2.05K] o[35 3.31K 2.05K] splits[4/8/4]",
2024-05-16T04:24:50.1417109Z tests               |     "Query 20240516_042448_00042_raqe3, FAILED, 1 node",
2024-05-16T04:24:50.1417922Z tests               |     "Splits: 16 total, 4 done (25.00%)",
2024-05-16T04:24:50.1418669Z tests               |     "1.58 [35 rows, 3.31KB] [22 rows/s, 2.09KB/s]",
2024-05-16T04:24:50.1419308Z tests               |     "",
2024-05-16T04:24:50.1420095Z tests               |     "Query 20240516_042448_00042_raqe3 failed: Error committing write parquet to Hive"]
2024-05-16T04:24:50.1421074Z tests               | to contain:
2024-05-16T04:24:50.1421690Z tests               |   ["CREATE TABLE", "Query Plan"]
2024-05-16T04:24:50.1422472Z tests               | but could not find the following element(s):
2024-05-16T04:24:50.1423281Z tests               |   ["CREATE TABLE", "Query Plan"]
2024-05-16T04:24:50.1423913Z tests               | 
2024-05-16T04:24:50.1425259Z tests               | 	at io.trino.tests.product.cli.TestTrinoCli.shouldPrintExplainAnalyzePlan(TestTrinoCli.java:416)
2024-05-16T04:24:50.1427234Z tests               | 	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
2024-05-16T04:24:50.1428777Z tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
2024-05-16T04:24:50.1430220Z tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
2024-05-16T04:24:50.1432046Z tests               | 	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
2024-05-16T04:24:50.1433710Z tests               | 	at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
2024-05-16T04:24:50.1435238Z tests               | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
2024-05-16T04:24:50.1479952Z tests               | 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
2024-05-16T04:24:50.1481452Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2024-05-16T04:24:50.1483137Z tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)

osscm avatar May 16 '24 16:05 osscm

@osscm the CI looks good now

anusudarsan avatar May 17 '24 15:05 anusudarsan

@osscm the CI looks good now

@anusudarsan, so good to merge now? :)

@ebyhr or @dain if you can please help to merge, thanks!

osscm avatar May 17 '24 15:05 osscm