trino icon indicating copy to clipboard operation
trino copied to clipboard

Improve type mapping for Snowflake Connector

Open lxynov opened this issue 11 months ago • 11 comments

Description

For https://github.com/trinodb/trino/issues/20977

Please reference snowflake.md diff for the exact type conversion behavior.

References for source-of-truths:

  • https://docs.snowflake.com/en/sql-reference/intro-summary-data-types
  • https://docs.snowflake.com/en/sql-reference/sql/show-columns#output
  • https://github.com/snowflakedb/snowflake-jdbc/blob/d4ca9f0e99b142fa0c1a3dffe5cc7bde4fdfaf61/src/main/java/net/snowflake/client/jdbc/SnowflakeUtil.java

Additional context and related issues

  • [ ] TODO: Make it configurable whether SF session property JDBC_TREAT_DECIMAL_AS_INT is set to FALSE

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. (x) Release notes are required, with the following suggested text:

# Snowflake Connector
* Improve type mapping for Snowflake Connector. ({issue}`20977`)

lxynov avatar Mar 11 '24 20:03 lxynov

@mosabua Thanks for the review! Addressed the comment.

@ebyhr Could you also help review the code part?

lxynov avatar Mar 21 '24 20:03 lxynov

Could you rebase on master to resolve conflicts?

ebyhr avatar Mar 23 '24 00:03 ebyhr

So sorry @lxynov but it looks like there is another rebase necessary since there is a conflict. I hope we can get this reviewed and merged before we get another conflict. Maybe @oneonestar @dprophet or @yuuteng can chime in if necesssary

mosabua avatar Mar 27 '24 23:03 mosabua

@mosabua No worry at all. Happy to rebase it again. Conflicts should be minimal. Thanks for keeping an eye on this. @ebyhr Could you help look at this https://github.com/trinodb/trino/pull/21012#discussion_r1538356264? If it makes sense, I'll go ahead rebasing this PR.

lxynov avatar Mar 27 '24 23:03 lxynov

Improve type mapping for Snowflake Connector

The current commit and PR title is unclear. Please clarify what is improved.

ebyhr avatar Apr 01 '24 07:04 ebyhr

The current commit and PR title is unclear. Please clarify what is improved.

Sure

  1. Fixed https://github.com/trinodb/trino/pull/17909#discussion_r1512196152
  2. Fixed https://github.com/trinodb/trino/pull/17909#discussion_r1230249433
  3. Implemented the unsupported_type_handling session property
  4. Removed unused tinyintColumnMapping(), smallintColumnMapping(), etc. from SnowflakeClient::toColumnMapping as they will never be called.
  5. The original implementation builds a map inside SnowflakeClient::toColumnMapping. So the map building will happen every time SnowflakeClient::toColumnMapping is called. The new implementation is more efficient as it inlines the branching/conditioning.
  6. Improved type mapping doc.

cc @ebyhr @mosabua

lxynov avatar Apr 01 '24 16:04 lxynov

Thanks for your explanation. ~I'm not convinced to changing all numeric types to decimal. Please exclude the changes from this PR.~

Also, please split into some commits so that we can review easily.

TODO: Make it configurable whether SF session property JDBC_TREAT_DECIMAL_AS_INT is set to FALSE

Why do we want to make it configurable?

ebyhr avatar Apr 02 '24 10:04 ebyhr

Thanks for your explanation. I'm not convinced to changing all numeric types to decimal. Please exclude the changes from this PR.

It's not a design decision to "change all numeric types to decimal". It's just to make things correct. In Snowflake, fixed-point types INT, INTEGER , BIGINT, SMALLINT, TINYINT , BYTEINT are synonymous with NUMBER(38,0) (link). So even if you create a TINYINT column in Snowflake, it'd be NUMBER(38,0) under the hood, and users are able to insert values as large as 99999999999999999999999999999999999999. I gave an example here when the original implementation would fail.

Additionally, even if you create a TINYINT column in Snowflake, it would be read in as either a BIGINT type or a DECIMAL type, depending on the JDBC_TREAT_DECIMAL_AS_INT session property. So the method tinyintColumnMapping() won't be used in any situation. It must be removed.

Why do we want to make it configurable?

If the service admin is sure that all decimal columns don't have values larger than Long.MAX_VALUE, he/she can set JDBC_TREAT_DECIMAL_AS_INT to FALSE. In this case all decimal values will be read in as BIGINT, which is more efficient than being read in as DECIMAL.

Also, please split into some commits so that we can review easily.

IMOO I don't think we need to split it into multiple commits. Essentially, this PR is to improve the implementation of SnowflakeClient::toColumnMapping and SnowflakeClient::toColumnMapping::toWriteMapping. Honestly speaking, the previous implementation is kind of messy in the sense that 1. it's building a static map inside a frequently-called method, 2. it's not handling default JDBC Connector configs, 3. it has code that will never be called, etc. The whole purpose of this PR is to clean things up.

CC'ing @hashhar @findepi @martint if you have bandwidth to also take a look.

lxynov avatar Apr 02 '24 17:04 lxynov

The whole purpose of this PR is to clean things up.

keeping cleanup separate from type mapping change would be very very helpful to review. otherwise it's hard to see which changes impact logic vs which ones don't.

I'll try and take a look anyway but unlikely to merge in current shape.

hashhar avatar Apr 03 '24 11:04 hashhar

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Apr 25 '24 17:04 github-actions[bot]

Applied stale-ignore label .. we definitely want this improvement. Please rebase @lxynov and let us know if you need any help

mosabua avatar Apr 26 '24 21:04 mosabua

Superseded by #21755

ebyhr avatar May 20 '24 02:05 ebyhr