refinery icon indicating copy to clipboard operation
refinery copied to clipboard

Normalize migrations before computing checksum

Open tmpfs opened this issue 9 months ago • 3 comments

Hi,

Thanks for the library!

I've come across something of an esoteric edge case and I have a hunch it's to do with Windows \r\n line endings but first some context.

I have a cross-platform app that allows users to create backup ZIP archives that contain an SQLite database (along with a bunch of other files). When these backups are restored into the app we run migrations to make sure the database is up to date with the latest schema.

To test this logic I have fixtures in the test suite of valid ZIP archives. If I generate the test fixtures on a *NIX machine and then run the test suite on Windows the migrations fail to apply (it says the filesystem one is different to the applied one). However, if i generate the backup ZIP archives on Windows all is well.

I wonder if you could have some insight into where the inconsistency might be? Would it be reasonable to always remove \r in migration source files?

Thanks 🙏

tmpfs avatar Mar 11 '25 02:03 tmpfs

Huh, of course it's the filesystem version's that differ forcing git's config core.eol to lf and disabling core.autocrlf fixes the issue.

I noticed too that changing comments in the migrations also invalidates the migration checksum which is a bit misleading as the schema hasn't actually changed.

I wonder if stripping comments and removing \n from migrations is worth doing to make the comparison a bit more robust?

tmpfs avatar Mar 11 '25 03:03 tmpfs

Hi, and thanks for your interest! That is interesting and that change probably would make sense, but changing that at this point would make it a breaking change to all the previous migrations right? I.e how would that new behavior affect previous migrations?

jxs avatar Mar 28 '25 11:03 jxs

Hey @jxs,

I think this is fairly straightforward to do in a backwards compatible manner, we add a new migrate_with_options() function and then the current behavior would call the new function with MigrateOptions::default(), for example:

#[derive(Default, Debug)]
pub struct MigrateOptions {
  pub normalize_newlines: bool,
  pub strip_comments:bool,
}

I think the trickier part will be modify the input as we stream the bytes to the hasher without allocating, skipping \r should be trivial, for comments we would need a flag to indicate whether we are in a comment.

I presume that instead of calling hash() on the sql: &str here we would iterate the sql string bytes and instead call write() on the hasher.

I don't know much about SQL comments except the -- delimiter, are there other ways to define comments in SQL that refinery supports?

tmpfs avatar Apr 23 '25 01:04 tmpfs

I've run into the same problem, and ended up ultimately forking refinery, and updating embed_migrations to allow specifying whether to normalize line endings or not. The code is here: https://github.com/raycast/refinery/commit/9ade7094ea80471e4eaffbe18398d162128c05bb It should be backwards compatible, by specifying the line endings that the project first used, i.e. either refinery::LineEndingNormalization::AsLF or refinery::LineEndingNormalization::AsCRLF – assuming the codebase doesn't have a mix of both line endings.

I'm happy to open a PR, if it's something that you would consider.

sxn avatar Nov 07 '25 16:11 sxn