dbt-sqlserver icon indicating copy to clipboard operation
dbt-sqlserver copied to clipboard

clean up relation usage and default behavior

Open dataders opened this issue 4 years ago • 8 comments

changes

  • implements a relation.py which sets the relation:
    • quoting policy to both:
      • use " to quote relation parts, and
    • inclusion policy to never include the database name when referencing a relation, just schema.identifier
  • fixes: #91 cleaner relation name gen ( {{ relation }} instead of {{ relation.schema }}.{{ relation.identifier }})
  • use adapter.drop_relation which deprecates sqlserver__drop_relation_script
  • re-enables the snapshot_strategy_check_cols test for Azure SQL (should have been re-enabled with #74)

dataders avatar Jan 26 '21 19:01 dataders

@jtcohen6 the import is failing but I have no idea why, but I'm willing to bet it's just a rookie python typo.

dataders avatar Jan 26 '21 19:01 dataders

issue

something weird is happening with drop_relation(). For all 8 steps of the empty sequence and the first 12 steps of the base test sequence, the drop_relation's work as they should in that

this jinja-sql

if object_id ('{{ relation }}','{{ object_id_type }}') is not null
    begin
    drop {{ relation.type }} {{ relation }}
    end

compiles to this

 if object_id ('dbt_test_azure_sql_210211000737938430161027.view_model__dbt_backup','V') is not null
      begin
      drop view dbt_test_azure_sql_210211000737938430161027.view_model__dbt_backup
      end

However, it fails starting with step 13 of the base test sequence (running the swappable.sql as incremental). Now the compiled drop_relation SQL looks like this:

if object_id (''dbt_test_azure_sql_210211000737938430161027'.'swappable'','V') is not null
      begin
      drop view 'dbt_test_azure_sql_210211000737938430161027'.'swappable'
      end

@jtcohen6, does the global_project's incremental materialization (and others) change the QuotePolicy somehow? wild guess -- does the { this } relation object inherit the adapter's quoting policy?

SQLServerQuotePolicy is currently set to False for parts, so I don't get why the swappable view is getting quoted here.

@dataclass
class SQLServerQuotePolicy(Policy):
    database: bool = False
    schema: bool = False
    identifier: bool = False

desired behavior

TL;DR I'd like '{{ relation }}' to be rendered as one of these three things:

  • 'database.schema.identifier' per SQLServerQuotePolicy, and
  • '''database''.''schema''.''identifier''' (inexplicably, ' is the default escape character in TSQL, instead of \)

side note, square brackets are the defacto TSQL way to quote relation parts and column names, but not sure if that's possible to set SQLServerRelation's quote_character to be two characters: [ or ].

dataders avatar Feb 11 '21 00:02 dataders

it's also worth checking to see if the default incremental materialization still needs overriding

@jtcohen6 I've pulled in changes from #102 which drops the overriding of the incremental materialization. Unfortunately, the same issue is still happening, so I'm going to start locally logging/debugging the global project's incremental materialization.

dataders avatar Feb 11 '21 17:02 dataders

Have you run it on a real dbt-project or just in the CI?

Great question. I actually just tried running a project that was previously run with (I believe) dbt-sqlserver==0.19.0. And we get an error. So we should def table this PR until 0.19.1, instead of cramming it prematurely into 0.19.0.1.

I haven't seen this error message before, maybe @jtcohen6 could explain. It seems it has to do with partial matches due to quoting differences? this might constitute a breaking change for users. What I don't understand is why:

  • it searches for an unquoted version, but
  • finds a quoted version, when
  • the quote policy is none, however
  • in all my debugging the relation was always showing up double-quoted (despite the False quote policy)
Compilation Error in model SAPHRPeopleDim (models/stage/SAPHRPeopleDim.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing 
  which relation to use, dbt will move on. Please delete "ajs_stg"."SAPHRPeopleDim", or rename it to be less ambiguous.
  Searched for: ajs_stg.saphrpeopledim
  Found: "ajs_stg"."SAPHRPeopleDim"
  
  > in macro materialization_view_default (macros/materializations/view/view.sql)
  > called by model SAPHRPeopleDim (models/stage/SAPHRPeopleDim.sql)

dataders avatar Feb 15 '21 10:02 dataders

I haven't seen this error message before, maybe @jtcohen6 could explain. It seems it has to do with partial matches due to quoting differences? this might constitute a breaking change for users.

Yes, that's right. If dbt finds an existing relation in the database that is similar to a model's relation-to-e, but not exactly the same (different quoting/casing), dbt will raise a compilation error to avoid accidentally overwriting a view/table it doesn't actually own. Here's the source for that behavior.

It sounds like you ran this same project previously, using dbt-sqlserver==0.19.0, when dbt was quoting relations. So, now that quoting has been turned off, dbt has identified the discrepancy. I agree this is a breaking change and should probably wait until v0.20.0.

in all my debugging the relation was always showing up double-quoted (despite the False quote poli

This is the part I'm not sure about! If the quote_policy is now False across the board, dbt should not be quoting relation components. Which brings me to ask: What's the actual reason to turn off quoting on SQLServer? I think I missed this part of the puzzle. The reason we turn it off for Snowflake is because, if an identifier name is quoted and contains non-uppercase characters, selecting it is a nightmare.

never quote any part by default (but I see zero evidence that changing this makes a difference?)

The reason to keep quoting is that, on several databases (e.g. Postgres), identifier names that start with certain characters must be quoted: dev_jerco."123Table". On other databases (e.g. Snowflake), quoting adds so much hassle that it just isn't worth the support for special characters. Some databases (e.g. BigQuery) don't ever support those kind of special-character identifier names, even when quoted.

jtcohen6 avatar Feb 15 '21 10:02 jtcohen6

What's the actual reason to turn off quoting on SQLServer?

great question @jtcohen6. Looks like there wasn't a good reason and I just did it to try and solve a bug that I stirred up along the way. It isn't necessary. It is however necessary to change our relation part quote character from a single quote to double.

@mikaelene I tested this on a local project and it works without issue. ready for for your further review/approval.

dataders avatar Feb 17 '21 09:02 dataders

@swanderz I've spent several hours today trying to reimplement database switching (see PR #111) in this branch. However, I've run into a wall with the function adapter.drop_relation. The macro create_table_as does the following steps:

  1. Drop the temporary view if it exists
  2. Create the temporary view
  3. Drop the temporary table if it exists
  4. Insert temporary view contents into the temporary table
  5. Drop the temporary view again

Because the temporary view is dropped twice, something screwy happens with the relation cache and the second drop view statement doesn't happen, leaving the temporary view in the database. I'm not certain what exactly is happening because honestly the relation cache is difficult to understand.

My thought is to just drop all uses of adapter.drop_relation and to do it with straightforward SQL rather than trying to understand dbt's relation cache. I don't see the point of the cache anyway, it shaves off milliseconds at most... But I want to check in with you before I do that and ask whether that's a direction you're willing to go.

Is it OK to replace all uses of adapter.drop_relation with some straightforward macro?

panasenco avatar Mar 01 '21 21:03 panasenco

@panasenco -- I very much appreciate you taking the time to try and make your PR with my refactor. TIL, I had no idea about the relation cache.

Totally open to the "straightforward SQL" approach, but any chance you could open up a different PR with the code you were stuck on so we can look at the error you were seeing? If anything, for curiosity's sake.

dataders avatar Mar 03 '21 09:03 dataders