snowflake-ingest-java
snowflake-ingest-java copied to clipboard
SNOW-647377 Quoted columns support
This PR is a resurrection of @sfc-gh-azagrebin 's PR https://github.com/snowflakedb/snowflake-ingest-java/pull/214 with adaptations for Parquet and added some integration tests.
The new code is using column display name everywhere except in Arrow/Parquet field names.
Originally, the idea was to pass the unquoted name from the server side, but it is not enough because we still need to translate between the name entered by the user and the display name of the column. For example, client-side quoting is required when ingesting into the column named "CREATE"
.
Later, we could move the quoting-related code to GSCommon to avoid duplication.
Originally, the idea was to pass the unquoted name from the server side, but it is not enough because we still need to translate between the name entered by the user and the display name of the column. For example, client-side quoting is required when ingesting into the column named
"CREATE"
.
Thanks Lukas, I might miss something, but I think if we pass both the display name and unquoted name we should be ok? Like you said, the unquoted name is for creating the Field, and display name is for anything else. I'm not sure if I understand the example you mentioned above, if the column name is "CREATE", then the input should be ""CREATE"", anything else shouldn't be matched, could you elaborate more?
In general, I don't like the idea of exposing more and more server side logic to the client, especially the reservedKeywords
in this case, so this option is really the last thing we should do if nothing else works.
@sfc-gh-tzhang Yeah, we can do what you suggest. The user would have to provide a valid column displayName
, so that the SDK can match it without any additional quoting logic to the display name from the server. There are some cases, though, when the display name is not obvious. For example, a column created as ab\ c
(unquoted) has display name "AB C"
, so the user would have to pass it like that: row.put("\"AB C\"", X)
. Are we ok with that? I guess it is fine as long as we have really good error messages, which would point out what the column display names are and what should the input formats be. WDYT?
@sfc-gh-tzhang We discussed this issue with @sfc-gh-azagrebin today again and the proposal is now the following: We keep the unquote
method in the SDK, start sending internal name from the server and the SDK will try to match unquoted input passed by the user with the internal name. This way the user can pass either CreATE
or "CreATE"
, for example, and the SDK would still be able to match it with the internal name CreATE
. quote
method and therefore also the list of keywords would go away.
Closing in favor of https://github.com/snowflakedb/snowflake-ingest-java/pull/293