trino
trino copied to clipboard
Improve type mapping for Snowflake Connector
Description
For https://github.com/trinodb/trino/issues/20977
Please reference snowflake.md
diff for the exact type conversion behavior.
References for source-of-truths:
- https://docs.snowflake.com/en/sql-reference/intro-summary-data-types
- https://docs.snowflake.com/en/sql-reference/sql/show-columns#output
- https://github.com/snowflakedb/snowflake-jdbc/blob/d4ca9f0e99b142fa0c1a3dffe5cc7bde4fdfaf61/src/main/java/net/snowflake/client/jdbc/SnowflakeUtil.java
Additional context and related issues
- [ ] TODO: Make it configurable whether SF session property
JDBC_TREAT_DECIMAL_AS_INT
is set toFALSE
Release notes
( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. (x) Release notes are required, with the following suggested text:
# Snowflake Connector
* Improve type mapping for Snowflake Connector. ({issue}`20977`)
@mosabua Thanks for the review! Addressed the comment.
@ebyhr Could you also help review the code part?
Could you rebase on master to resolve conflicts?
So sorry @lxynov but it looks like there is another rebase necessary since there is a conflict. I hope we can get this reviewed and merged before we get another conflict. Maybe @oneonestar @dprophet or @yuuteng can chime in if necesssary
@mosabua No worry at all. Happy to rebase it again. Conflicts should be minimal. Thanks for keeping an eye on this. @ebyhr Could you help look at this https://github.com/trinodb/trino/pull/21012#discussion_r1538356264? If it makes sense, I'll go ahead rebasing this PR.
Improve type mapping for Snowflake Connector
The current commit and PR title is unclear. Please clarify what is improved.
The current commit and PR title is unclear. Please clarify what is improved.
Sure
- Fixed https://github.com/trinodb/trino/pull/17909#discussion_r1512196152
- Fixed https://github.com/trinodb/trino/pull/17909#discussion_r1230249433
- Implemented the
unsupported_type_handling
session property - Removed unused
tinyintColumnMapping()
,smallintColumnMapping()
, etc. fromSnowflakeClient::toColumnMapping
as they will never be called. - The original implementation builds a map inside
SnowflakeClient::toColumnMapping
. So the map building will happen every timeSnowflakeClient::toColumnMapping
is called. The new implementation is more efficient as it inlines the branching/conditioning. - Improved type mapping doc.
cc @ebyhr @mosabua
Thanks for your explanation. ~I'm not convinced to changing all numeric types to decimal. Please exclude the changes from this PR.~
Also, please split into some commits so that we can review easily.
TODO: Make it configurable whether SF session property JDBC_TREAT_DECIMAL_AS_INT is set to FALSE
Why do we want to make it configurable?
Thanks for your explanation. I'm not convinced to changing all numeric types to decimal. Please exclude the changes from this PR.
It's not a design decision to "change all numeric types to decimal". It's just to make things correct. In Snowflake, fixed-point types INT
, INTEGER
, BIGINT
, SMALLINT
, TINYINT
, BYTEINT
are synonymous with NUMBER(38,0)
(link). So even if you create a TINYINT
column in Snowflake, it'd be NUMBER(38,0)
under the hood, and users are able to insert values as large as 99999999999999999999999999999999999999
. I gave an example here when the original implementation would fail.
Additionally, even if you create a TINYINT
column in Snowflake, it would be read in as either a BIGINT
type or a DECIMAL
type, depending on the JDBC_TREAT_DECIMAL_AS_INT
session property. So the method tinyintColumnMapping()
won't be used in any situation. It must be removed.
Why do we want to make it configurable?
If the service admin is sure that all decimal columns don't have values larger than Long.MAX_VALUE
, he/she can set JDBC_TREAT_DECIMAL_AS_INT
to FALSE
. In this case all decimal values will be read in as BIGINT
, which is more efficient than being read in as DECIMAL
.
Also, please split into some commits so that we can review easily.
IMOO I don't think we need to split it into multiple commits. Essentially, this PR is to improve the implementation of SnowflakeClient::toColumnMapping
and SnowflakeClient::toColumnMapping::toWriteMapping
. Honestly speaking, the previous implementation is kind of messy in the sense that 1. it's building a static map inside a frequently-called method, 2. it's not handling default JDBC Connector configs, 3. it has code that will never be called, etc. The whole purpose of this PR is to clean things up.
CC'ing @hashhar @findepi @martint if you have bandwidth to also take a look.
The whole purpose of this PR is to clean things up.
keeping cleanup separate from type mapping change would be very very helpful to review. otherwise it's hard to see which changes impact logic vs which ones don't.
I'll try and take a look anyway but unlikely to merge in current shape.
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
Applied stale-ignore label .. we definitely want this improvement. Please rebase @lxynov and let us know if you need any help
Superseded by #21755