sdk
sdk copied to clipboard
bug: Incorrect table name resolved if stream name is hyphenated
Singer SDK Version
0.44.3
Is this a regression?
- [ ] Yes
Python Version
NA
Bug scope
Targets (data type handling, batching, SQL object generation, etc.)
Operating System
No response
Description
For SQL targets, if a stream name is hyphenated it is assumed to be fully-qualified and the part after the last hyphen is taken as the table name. In our case, we have a tap stream name of tap_bigquery-weekly_fuel_prices and expect a table of name in similar shape to be created in Snowflake - instead, weekly_fuel_prices is resolved which is not desirable is our case. Is there a reason for this behaviour?
https://github.com/meltano/sdk/blob/881ffd2041c9eca9c50a5e94ffe61a121579ef61/singer_sdk/sinks/sql.py#L76-L85
Link to Slack/Linen
No response
https://github.com/Matatika/target-snowflake--meltanolabs/commit/101290a3f26d3fa4c0479abc98cbef7a7cf7bc09
this is also affecting me with tap and target-mysql. Is there a fix?
https://github.com/meltano/sdk/blob/868b5667cb1620e88efc561a74bc696d64e7f5b3/singer_sdk/sinks/sql.py#L87-L105
this is also affecting me with tap and target-mysql. Is there a fix?
https://github.com/meltano/sdk/blob/main/singer_sdk/sinks/sql.py#L87-L105
@leoperegrino have you tried using the default_target_schema setting?
we have a tap stream name of tap_bigquery-weekly_fuel_prices and expect a table of name in similar shape to be created in Snowflake
<schema_name>-<table_name> is an informal standard within the singer community to map tables to schemas in SQL targets, so I would that the tap is misbehaving in this case. I'd also be happy to read suggestions on how we make targets more flexible in this regard.
@edgarrmondragon it does seem to pass through the connection phase but it fails later in the modify statement. The table name should be enclosed in backticks but the generated SQL does not do this:
Altering with SQL: ALTER TABLE table-name.column ADD COLUMN x_sdc_table_version INTEGER
@edgarrmondragon I was trying to setup binary logging replication between two mariadb instances using tap-mysql and target-mysql.
After a few hours of debugging, it seemed that the target was causing the issue. I tweaked the virtual env code to allow column alter here and here. Also uncommented the schema_name property here.
Now, it works for the first time, then fails due to a current_type_length comparisson with None. Which requires this to be this:
elif hasattr(current_type, 'length'):
# Add this check to handle None values
if current_type.length is not None:
current_type_length = current_type.length
The target has not been updated for 2 years, unfortunatelly.
<schema_name>-<table_name>is an informal standard within the singer community to map tables to schemas in SQL targets, so I would that the tap is misbehaving in this case. I'd also be happy to read suggestions on how we make targets more flexible in this regard.
I guess my expectation is that with schema or default_target_schema specified (cc https://github.com/meltano/sdk/issues/2766, https://github.com/MeltanoLabs/target-snowflake/discussions/92#discussioncomment-11259580), the stream name should always be taken at face value. If those settings are not configured, then I can totally understand trying to parse out the appropriate schema and table name from the stream name.
From the code @leoperegrino linked to in https://github.com/meltano/sdk/issues/2909#issuecomment-2776707446, it is clear this is already done for the schema name, but default_target_schema is not respected in a similar manner when when resolving the table name.
Should this be reopened since #3066 was merged? I think there should be a way to opt-out of this behaviour, or better docs at a minimum.
@ReubenFrankel is there a strong reason for you to be using a hyphen in the stream name? i.e. would it be OK for you to use a different delimiter?
@edgarrmondragon In our own taps, I don't think there are any. I was thinking more about various other legacy taps we use (not yet adopted/maintained by us) that do contain hyphens in stream names, which started loading differently into Snowflake with target-snowflake when we switched over from an old transferwise variant fork to meltanolabs. Perhaps the answer was always just to fix our test assertions on where data was loading into, but I didn't know if it was indicative of a problem other people were also having (so sought to patch the behaviour in our own fork).