schemachange icon indicating copy to clipboard operation
schemachange copied to clipboard

Flag for error on ignored versioned migration and versioned script regex

Open zanebclark opened this issue 1 year ago • 4 comments

This pull request started out as a feature addition. It ended up being a bit of a repo refactor.

  • Format with Black. There's already an open PR on this: https://github.com/Snowflake-Labs/schemachange/pull/217
  • Package with Poetry. There's already an open PR on this: https://github.com/Snowflake-Labs/schemachange/pull/218
  • If the "change_history_table" string is a schema and a table, use the "snowflake_database" instead of "METADATA"
  • Deprecate verbse cli argument. Opt for log-level instead
  • Replace print statements and if verbose checks with a structlog logger and log levels.
  • Removed SecretManager in favor of a structlog processor that implements very similar logic.
  • Remove pandas dependency in favor of a dictionary. This should address one of the complaints in https://github.com/Snowflake-Labs/schemachange/issues/205.
  • Centralize config-related logic and input checking into a Config object. For example, there was logic for checking the validity of a directory and the presence of environment variables. Centralizing this logic is conceptually easier for me to wrap my head around and makes testing much easier.
  • Generally, moved a bunch of class definitions out of cli.py and into separate files
  • Stored script details in a Script class to clarify the API
  • Removed a bunch of credential-related logic from SnowflakeClient and created a Credential factory.
  • Generally expanded test case coverage.
  • Added two CLI arguments:
    • raise-exception-on-ignored-version-script: At my shop, we're using epoch time as our version number. If you create a script with an epoch time and your colleague merges their changes in before you get a chance to, the max_published_version will cause schemachange will skip your script without complaint. To address this, I handle versioned scripts almost identically to repeatable scripts. I store the metadata on the script's execution in a dictionary and compare that to the scripts in the script directory. If a script is "stale", hasn't been applied, and the flag is true: I raise an exception.
    • version-number-validation-regex: Again, we're using epoch time as the version number in my shop. If someone deploys a script prefixed with Vspam__, future scripts versioned with the current epoch time will be skipped. The max_published_version is now spam and cannot be trumped by a number. To address this, I've added a version_number_validation_regex CLI argument that will reject spam as a version number if I've specificed \d{10}

Not everything is peachy:

  • I think I might have broken some stuff around the render command. Specifically, the bit where we clear config values for the render subcommand. I'd appreciate some feedback on how to handle this.
  • I ran out of steam when I got to the render.py and SnowflakeSession.py modules. There's work to do there in terms of dependency inversion and testing. I'm putting this out there now for feedback, not because I think it's "done". Don't be surprised if this lack is fixed in the near future on this PR.
  • I saw the dependency checker on the pipeline. If that's a serious sensitivity, this PR might be dead in the water.
  • Performance likely took a hit, though I haven't done any profiling. I'm not too concerned about the performance of this utility, but someone here might be.

Sorry for the monstrous pull request. Consider it a reflection of my inability to understand code that I didn't write :blush:

zanebclark avatar Jan 22 '24 01:01 zanebclark

@zanebclark Amazing work here. Will need to review and see if we can carve out the changes into multiple pull requests to be released without breaking previous patterns. For breaking changes, we will have to place them in a 4.0 release and plan for the future with some communications related to it.

Give us some time to review but if you can carve out your PRs into categories that will be incremental that would be really helpful.

sfc-gh-tmathew avatar Mar 06 '24 14:03 sfc-gh-tmathew

@sfc-gh-tmathew , apologies for the delay in my response. I'm embarrassed to have created such a behemoth. I've been eating my own dogfood by using this branch to create and maintain Snowflake environments. I've encountered two bugs:

  1. A dry run without a change_history table will result in a failure. I didn't properly mimic the previous logic to handle this case.
  2. When I configured the regex pattern as \d{9}, it failed to raise an exception when a microsecond epoch time was used as the version number.

I included the conversion to Poetry for my own convenience. If there's no interest in simplifying package management, that bit can be stripped out. Let me know what you think.

zanebclark avatar Mar 18 '24 14:03 zanebclark

@sfc-gh-tmathew , apologies for the delay in my response. I'm embarrassed to have created such a behemoth. I've been eating my own dogfood by using this branch to create and maintain Snowflake environments. I've encountered two bugs:

  1. A dry run without a change_history table will result in a failure. I didn't properly mimic the previous logic to handle this case.
  2. When I configured the regex pattern as \d{9}, it failed to raise an exception when a microsecond epoch time was used as the version number.

I included the conversion to Poetry for my own convenience. If there's no interest in simplifying package management, that bit can be stripped out. Let me know what you think.

Please strip out the poetry part for package management. We will still need to review the rest.

sfc-gh-tmathew avatar Apr 02 '24 15:04 sfc-gh-tmathew

If the "change_history_table" string is a schema and a table, use the "snowflake_database" instead of "METADATA"

This will be a breaking change to existing end users for those users who created METADATA database OR uses fully qualified change history table.

If you are going with a change history table per database, I can see why you need this option to default to this option.

Please consider calling schemachange for each database and fully qualify the change history table with the database to override the metadata database reference.

sfc-gh-tmathew avatar Apr 02 '24 15:04 sfc-gh-tmathew