sqlx
sqlx copied to clipboard
migration checksums different on unix/non-unix systems due to CRLF/LF line endings
Bug Description
When a migration checksum is generated, it seems like it's putting the whole file as bytes through SHA384 generator as stated by this code:
impl Migration {
pub fn new(
version: i64,
description: Cow<'static, str>,
migration_type: MigrationType,
sql: Cow<'static, str>,
) -> Self {
let checksum = Cow::Owned(Vec::from(Sha384::digest(sql.as_bytes()).as_slice()));
Migration {
version,
description,
migration_type,
sql,
checksum,
}
}
}
What that means is when a migration is checked with version control (git in my case) it might manipulate the CRLF/LF line endings based on system. You could also run into this problem when creating/modifying the file on 2 different system even without version control I believe.
This was causing version mismatch error in my program, as I run migrate! in my main function to ensure that the db is set up correctly.
I work on 2 different systems and as of now it seems like I have to comment out my migrate! line if I'm sure it was run, don't know how to handle it in production though.
I'm not sure how to best handle this issue, but maybe conversion CRLF->LF before passing the text to sha would make sense?
Minimal Reproduction
- create a repository with a migration file on windows
- run migration
- check out with git while having core.autocrlf=true
- clone repository on mac/linux
- run migration
Info
- SQLx version: 0.7.1
- SQLx features enabled: "postgres", "runtime-tokio"
- Database server and version: Postgres
- Operating system: 2 different
-
rustc --version
: rustc 1.70.0 (90c541806 2023-05-31)
Maybe if it detects a mismatch, it could change the line endings and try again, so the change wouldn't affect anything that is currently working properly.
For example the following method could be added in this impl (I haven't tested this):
pub fn matches(&self, checksum: &Cow<'static, [u8]>) -> bool {
if self.checksum == *checksum {
return true;
}
let alternate_line_ending = if self.sql.contains("\r\n") {
"\n"
} else {
"\r\n"
};
let lines: Vec<_> = self.sql.lines().collect();
let alternate_checksum = Cow::<'static, [u8]>::Owned(Vec::from(
Sha384::digest(lines.join(alternate_line_ending).as_bytes()).as_slice(),
));
alternate_checksum == *checksum
}
I have the same issue.
Maybe if it detects a mismatch, it could change the line endings and try again, so the change wouldn't affect anything that is currently working properly.
Exactly the best practice.
https://github.com/rustdesk-org/sqlx/commit/50392d5a29edfeb2ba6c03b980547447736f8bee works for me.
Perhaps we could have some quote-aware whitespace removal before hashing the SQL?
Like:
create
table `two spaces` (id int);
Becoming:
create table `two spaces` (id int);
No \r or \n at all and replacing any sequence of continuous any-space character with one single space.
Hi, any update for this bug? Do you know if it will be fixed in the next version?
I believe replacing all CRLF
with LF
before the hashing will solve most issues. We're not editing the original SQL statement, just a clone that will be hashed.
Single issue that I can still see is that CRLF
and LF
cannot be distinguished in a hardcoded string, but this seems like an extreme edge case.
It is not extreme edge case for sqlite. Because we may copy it here and there. We may run it on Linux server, but back up on a Windows server.
For the time being, I have to use my fork, https://github.com/rustdesk-org/sqlx/commit/50392d5a29edfeb2ba6c03b980547447736f8bee
Perhaps I'm misunderstanding, but could you give me an example of a bug that might arise from replacing CRLF
by LF
before hashing? The functionality of the queries will not change depending on the newlines in the hash.
Because it will change the checksums for anyone deploying from/to Windows and break all their migrations.
I'm not against adding this, but it needs to be opt-in.
This is also really straightforward to fix yourself with a .gitattributes
file.