trino
trino copied to clipboard
Add `ignore_broken_catalogs` system session property
Description
Provide a jdbc driver property ignoreBrokenCatalogs
to avoid metadata queries to error out on a broken catalog (eg. not working Hive Metastore).
Is this change a fix, improvement, new feature, refactoring, or other?
Improvement
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
Change to the JDBC client
Related issues, pull requests, and links
Documentation
( ) No documentation is needed. (x) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required. ( ) Release notes entries required with the following suggested text:
# Section
* Fix some things. ({issue}`issuenumber`)
In my opinion, this should be implemented server-side.
In my opinion, this should be implemented server-side.
Agreed, that would be better indeed. FYI the ODBC client already implements this on the clientside.
Can you explain what is a "broken" catalog?
Also, what would be the semantics when a catalog is broken? In particular, what happens for SHOW TABLES, SHOW SCHEMAS, any queries that reference a table in such catalog, etc?
Finally, I'd prefer to avoid session properties and configuration options unless absolutely necessary. The proliferation of options that never get removed makes it much harder to understand how to set up and run Trino over time. What's the problem we're trying to solve? Why would someone turn this on or off for a given query or cluster?
Can you explain what is a "broken" catalog?
A broken catalog is a one that is throwing exceptions during metadata retrieval, eg an unavailable Hive metastore/any timeouts. If something breaks in just one catalog, it results in no data being returned.
Also, what would be the semantics when a catalog is broken? In particular, what happens for SHOW TABLES, SHOW SCHEMAS, any queries that reference a table in such catalog, etc?
It is currently limited to the standard jdbc tables I had originally implemented to still throw an exception if the catalog is explicitly referenced. I have not yet implemented that on the serverside.
Finally, I'd prefer to avoid session properties and configuration options unless absolutely necessary. The proliferation of options that never get removed makes it much harder to understand how to set up and run Trino over time. What's the problem we're trying to solve? Why would someone turn this on or off for a given query or cluster?
So the use case is BI tools like Tableau that query the system.jdbc
tables for metadata retrieval. Customers would be able to browse the catalogs without any errors, silently ignoring any 'broken' catalogs. Any timeouts in those broken connectors would still apply, so it is not necessarily a smooth or pleasant user experience.
data:image/s3,"s3://crabby-images/31607/31607e6808b0948fe581332c84a64c3f23de66c2" alt="image"
cc: @leniartek
@martint, this PR is trying to address a real pain point for users. We have seen situations where a catalog can no longer be accessed and that can make some popular BI tools which eagerly list schemas/tables unusable. It's not a great user experience. I think it makes sense to provide a workaround that can be enabled per user, while keeping the current behavior as the default.
@findepi, would you have time to re-review this? We have learned about yet another tool (Dataiku) where a broken catalog prevents metadata for properly configured catalogs to be retrieved.
would you have time to re-review this?
sure, will enqueue that. Just seeing that my last comment (https://github.com/trinodb/trino/pull/13311#discussion_r978697695) wasn't responded to. Should I wait a bit more?
@findepi : Can you please take another look. I have processed all remarks. Test failure is not related.
In my opinion, this should be implemented server-side.
Agreed, that would be better indeed. FYI the ODBC client already implements this on the clientside.
Trino does not have an ODBC client
Can we start by creating an issue explaining the specific problems we’re solving? Once we’ve agreed on the problem, we can discuss potential solutions.
Fair enough, I have created https://github.com/trinodb/trino/issues/16361. Let's continue the discussion there.
Is this still in progress @mdesmet or is this replaced with some other work
Is this still in progress @mdesmet or is this replaced with some other work
It is no more relevant as we could not reach agreement on implementation. Closing.
🍿