sdk icon indicating copy to clipboard operation
sdk copied to clipboard

bug: Incorrect table name resolved if stream name is hyphenated

Open ReubenFrankel opened this issue 8 months ago • 10 comments

Singer SDK Version

0.44.3

Is this a regression?

  • [ ] Yes

Python Version

NA

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

No response

Description

For SQL targets, if a stream name is hyphenated it is assumed to be fully-qualified and the part after the last hyphen is taken as the table name. In our case, we have a tap stream name of tap_bigquery-weekly_fuel_prices and expect a table of name in similar shape to be created in Snowflake - instead, weekly_fuel_prices is resolved which is not desirable is our case. Is there a reason for this behaviour?

https://github.com/meltano/sdk/blob/881ffd2041c9eca9c50a5e94ffe61a121579ef61/singer_sdk/sinks/sql.py#L76-L85

Link to Slack/Linen

No response

ReubenFrankel avatar Mar 20 '25 16:03 ReubenFrankel

https://github.com/Matatika/target-snowflake--meltanolabs/commit/101290a3f26d3fa4c0479abc98cbef7a7cf7bc09

ReubenFrankel avatar Mar 20 '25 17:03 ReubenFrankel

this is also affecting me with tap and target-mysql. Is there a fix?

https://github.com/meltano/sdk/blob/868b5667cb1620e88efc561a74bc696d64e7f5b3/singer_sdk/sinks/sql.py#L87-L105

leoperegrino avatar Apr 03 '25 19:04 leoperegrino

this is also affecting me with tap and target-mysql. Is there a fix?

https://github.com/meltano/sdk/blob/main/singer_sdk/sinks/sql.py#L87-L105

@leoperegrino have you tried using the default_target_schema setting?

edgarrmondragon avatar Apr 03 '25 19:04 edgarrmondragon

we have a tap stream name of tap_bigquery-weekly_fuel_prices and expect a table of name in similar shape to be created in Snowflake

<schema_name>-<table_name> is an informal standard within the singer community to map tables to schemas in SQL targets, so I would that the tap is misbehaving in this case. I'd also be happy to read suggestions on how we make targets more flexible in this regard.

edgarrmondragon avatar Apr 03 '25 19:04 edgarrmondragon

@edgarrmondragon it does seem to pass through the connection phase but it fails later in the modify statement. The table name should be enclosed in backticks but the generated SQL does not do this:

Altering with SQL: ALTER TABLE table-name.column ADD COLUMN x_sdc_table_version INTEGER

leoperegrino avatar Apr 03 '25 19:04 leoperegrino

@edgarrmondragon I was trying to setup binary logging replication between two mariadb instances using tap-mysql and target-mysql.

After a few hours of debugging, it seemed that the target was causing the issue. I tweaked the virtual env code to allow column alter here and here. Also uncommented the schema_name property here.

Now, it works for the first time, then fails due to a current_type_length comparisson with None. Which requires this to be this:

  elif hasattr(current_type, 'length'):
      # Add this check to handle None values
      if current_type.length is not None:
          current_type_length = current_type.length

The target has not been updated for 2 years, unfortunatelly.

leoperegrino avatar Apr 03 '25 21:04 leoperegrino

<schema_name>-<table_name> is an informal standard within the singer community to map tables to schemas in SQL targets, so I would that the tap is misbehaving in this case. I'd also be happy to read suggestions on how we make targets more flexible in this regard.

I guess my expectation is that with schema or default_target_schema specified (cc https://github.com/meltano/sdk/issues/2766, https://github.com/MeltanoLabs/target-snowflake/discussions/92#discussioncomment-11259580), the stream name should always be taken at face value. If those settings are not configured, then I can totally understand trying to parse out the appropriate schema and table name from the stream name.

From the code @leoperegrino linked to in https://github.com/meltano/sdk/issues/2909#issuecomment-2776707446, it is clear this is already done for the schema name, but default_target_schema is not respected in a similar manner when when resolving the table name.

ReubenFrankel avatar Apr 04 '25 20:04 ReubenFrankel

Should this be reopened since #3066 was merged? I think there should be a way to opt-out of this behaviour, or better docs at a minimum.

ReubenFrankel avatar May 28 '25 18:05 ReubenFrankel

@ReubenFrankel is there a strong reason for you to be using a hyphen in the stream name? i.e. would it be OK for you to use a different delimiter?

edgarrmondragon avatar May 28 '25 22:05 edgarrmondragon

@edgarrmondragon In our own taps, I don't think there are any. I was thinking more about various other legacy taps we use (not yet adopted/maintained by us) that do contain hyphens in stream names, which started loading differently into Snowflake with target-snowflake when we switched over from an old transferwise variant fork to meltanolabs. Perhaps the answer was always just to fix our test assertions on where data was loading into, but I didn't know if it was indicative of a problem other people were also having (so sought to patch the behaviour in our own fork).

ReubenFrankel avatar May 28 '25 23:05 ReubenFrankel