sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

migration checksums different on unix/non-unix systems due to CRLF/LF line endings

Open DawidPietrykowski opened this issue 1 year ago • 8 comments

Bug Description

When a migration checksum is generated, it seems like it's putting the whole file as bytes through SHA384 generator as stated by this code:

impl Migration {
    pub fn new(
        version: i64,
        description: Cow<'static, str>,
        migration_type: MigrationType,
        sql: Cow<'static, str>,
    ) -> Self {
        let checksum = Cow::Owned(Vec::from(Sha384::digest(sql.as_bytes()).as_slice()));
        Migration {
            version,
            description,
            migration_type,
            sql,
            checksum,
        }
    }
}

What that means is when a migration is checked with version control (git in my case) it might manipulate the CRLF/LF line endings based on system. You could also run into this problem when creating/modifying the file on 2 different system even without version control I believe.

This was causing version mismatch error in my program, as I run migrate! in my main function to ensure that the db is set up correctly.

I work on 2 different systems and as of now it seems like I have to comment out my migrate! line if I'm sure it was run, don't know how to handle it in production though.

I'm not sure how to best handle this issue, but maybe conversion CRLF->LF before passing the text to sha would make sense?

Minimal Reproduction

  1. create a repository with a migration file on windows
  2. run migration
  3. check out with git while having core.autocrlf=true
  4. clone repository on mac/linux
  5. run migration

Info

  • SQLx version: 0.7.1
  • SQLx features enabled: "postgres", "runtime-tokio"
  • Database server and version: Postgres
  • Operating system: 2 different
  • rustc --version: rustc 1.70.0 (90c541806 2023-05-31)

DawidPietrykowski avatar Jul 26 '23 21:07 DawidPietrykowski

Maybe if it detects a mismatch, it could change the line endings and try again, so the change wouldn't affect anything that is currently working properly.

For example the following method could be added in this impl (I haven't tested this):

    pub fn matches(&self, checksum: &Cow<'static, [u8]>) -> bool {
        if self.checksum == *checksum {
            return true;
        }
        let alternate_line_ending = if self.sql.contains("\r\n") {
            "\n"
        } else {
            "\r\n"
        };
        let lines: Vec<_> = self.sql.lines().collect();
        let alternate_checksum = Cow::<'static, [u8]>::Owned(Vec::from(
            Sha384::digest(lines.join(alternate_line_ending).as_bytes()).as_slice(),
        ));
        alternate_checksum == *checksum
    }

YgorSouza avatar Jul 27 '23 05:07 YgorSouza

I have the same issue.

Maybe if it detects a mismatch, it could change the line endings and try again, so the change wouldn't affect anything that is currently working properly.

Exactly the best practice.

https://github.com/rustdesk-org/sqlx/commit/50392d5a29edfeb2ba6c03b980547447736f8bee works for me.

rustdesk avatar Aug 02 '23 15:08 rustdesk

Perhaps we could have some quote-aware whitespace removal before hashing the SQL?

Like:

create
         table `two  spaces` (id int);

Becoming:

create table `two  spaces` (id int);

No \r or \n at all and replacing any sequence of continuous any-space character with one single space.

Andrepuel avatar Nov 24 '23 13:11 Andrepuel

Hi, any update for this bug? Do you know if it will be fixed in the next version?

gilgabo avatar Feb 02 '24 14:02 gilgabo

I believe replacing all CRLF with LF before the hashing will solve most issues. We're not editing the original SQL statement, just a clone that will be hashed.

Single issue that I can still see is that CRLF and LF cannot be distinguished in a hardcoded string, but this seems like an extreme edge case.

Nahuel-M avatar Mar 11 '24 12:03 Nahuel-M

It is not extreme edge case for sqlite. Because we may copy it here and there. We may run it on Linux server, but back up on a Windows server.

For the time being, I have to use my fork, https://github.com/rustdesk-org/sqlx/commit/50392d5a29edfeb2ba6c03b980547447736f8bee

rustdesk avatar Mar 11 '24 13:03 rustdesk

Perhaps I'm misunderstanding, but could you give me an example of a bug that might arise from replacing CRLF by LF before hashing? The functionality of the queries will not change depending on the newlines in the hash.

Nahuel-M avatar Mar 12 '24 12:03 Nahuel-M

Because it will change the checksums for anyone deploying from/to Windows and break all their migrations.

I'm not against adding this, but it needs to be opt-in.

This is also really straightforward to fix yourself with a .gitattributes file.

abonander avatar Mar 12 '24 17:03 abonander