fleet
fleet copied to clipboard
Validate column collation for new migrations in CI
Goal
| User story |
|---|
| As a Fleet contributor, |
| I want to have CI validate that proper collations were added on new migrations |
| so that I can avoid creating issues for customers when they perform database migrations. |
Key result
Preventing customer database/collation related bugs from happening.
Context
- Product Designer: @iansltx
We've done one-off collation fix migrations (most recently in #25353), and the most recent migration (as of creation of this issue) sets default collation database-wide, but if a DB is restored from a backup the collation default won't get added in, so implicitly defined columns will get collated wrong.
Changes
Engineering
- [x] Test plan is finalized
- [ ] Contributor API changes: No changes.
- [ ] CI changes: Add to the database check CI action a check that simulates migrating into a database without a DB-level collation default, and fail the check if the migration results include wrong-collation/wrong-charset string columns.
ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".
QA
Risk assessment
- Risk level: Low
Test plan
- On a branch, add a database migration that creates a table with a VARCHAR column that doesn't specify a collation at either the table or the column level.
- Create a pull request so CI runs. CI should fail.
- Fix the migration to include collation on the table. CI should pass.
- Revise the migration to include collation on the column rather than the table. CI should pass.
Testing notes
Confirmation
- [ ] Engineer: Added comment to user story confirming successful completion of test plan.
- [ ] QA: Added comment to user story confirming successful completion of test plan.
Needs to be updated per create an engineering-initiated story process.
@iansltx Thanks for updating. This has bitten us several times, resulting in critical bugs and unplanned work. I'm prioritizing to the drafting board and assigning back to you.
@jmwatts I'm guessing this test plan is too short for your confort; let me know what you need from me to fill this out. FWIW #26670 includes a bunch of examples (basically any file other than server/datastore/mysql/migrations/tables/20250127162751_AddUnifiedQueueTable.go or server/datastore/mysql/migrations/tables/20250127162751_AddUnifiedQueueTable_test.go) of migrations that, prior to that PR, didn't correctly specify collations where they should have, so they can be used as inspiration for the "this should break the build" migration.
@iansltx if the fix is to make CI fail when there's a new migration that adds a column where collation isn't specified, and CI passes if the collation is set appropriately, then this test plan is probably sufficient.
Is this a change in CI, not the product? Would this cause any issues if a person needed to create a new table as part of a patch on an older release (hypothetical but not impossible)?
I'm assuming there are no circumstances for which we wouldn't want to set collation when we're adding a new table/column in a migration. Is it correct to assume that the output of the failure will be easy enough to understand and identify what the problem is? Thanks for pointing out the examples too!
@jmwatts
Is this a change in CI, not the product?
Correct.
Would this cause any issues if a person needed to create a new table as part of a patch on an older release (hypothetical but not impossible)?
It would not.
- Backports wouldn't include this CI code.
- Backports should enforce proper collation anyway, the preferred collation hasn't changed in years, and migrations to correct existing collations wouldn't fail this test.
- As part of #26670 I retconned existing migrations so new installs would be in the proper state from the get-go, on top of the "where collation is broken, fix it" addition to the unified queue migration. So for anything that we don't have to backport we're just dealing with net-new issues.
I'm assuming there are no circumstances for which we wouldn't want to set collation when we're adding a new table/column in a migration.
There are exceptions, but they (search server/datasroe/mysql/shcema.sql for COLLATE utf8mb4_bin) are rare enough that it's reasonable to ask the dev to add an entry to an exception list on the CI job for those exceptions. The exceptions are rare because collation only affects string columns; numbers don't have the collation problem.
Is it correct to assume that the output of the failure will be easy enough to understand and identify what the problem is?
Yes.
If the above answers your questions, sufficiently, net of any test plan tweaks you want to make (feel free to make 'em), feel free to bump this to User Story Review :)
it's reasonable to ask the dev to add an entry to an exception list on the CI job for those exceptions.
@iansltx Will this be documented somewhere such that if a dev runs into this failure they'll know what to do?
Other than that one question, looks good, so I'll move it over.
@jmwatts Yep, "or add an exception to X file" should be part of the error message that's shown in CI.
Hey team! Please add your planning poker estimate with Zenhub @jahzielv @ksykulev
I've made progress on this, but moving to Waiting to work on https://github.com/fleetdm/fleet/issues/28815 as that needs to go out in 4.68.0. I'll be back to this one afterwards.
cc @mostlikelee
QA Notes (in progress)
- On a branch, add a database migration that creates a table with a VARCHAR column that doesn't specify a collation at either the table or the column level.
- Create a (draft) pull request so CI runs. CI should fail.
https://github.com/fleetdm/fleet/actions/runs/15354969968/job/43211904370?pr=29637
Result:
2025/05/30 20:26:54 [2025-05-30] Create Fake Table
migration_test.go:185:
Error Trace: /home/runner/work/fleet/fleet/server/datastore/mysql/migrations/tables/migration_test.go:185
/home/runner/work/fleet/fleet/server/datastore/mysql/migrations/tables/collation_test.go:30
Error: elements differ
extra elements in list B:
([]interface {}) (len=2) {
(tables.collationData) {
CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
TableName: (string) (len=10) "fake_table",
ColumnName: (string) (len=9) "host_uuid",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
TableName: (string) (len=10) "fake_table",
ColumnName: (string) (len=12) "command_uuid",
CharacterSetName: (string) (len=7) "utf8mb4"
}
}
listA:
([]tables.collationData) (len=4) {
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=14) "enroll_secrets",
ColumnName: (string) (len=6) "secret",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=5) "hosts",
ColumnName: (string) (len=8) "node_key",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=5) "hosts",
ColumnName: (string) (len=14) "orbit_node_key",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=5) "teams",
ColumnName: (string) (len=8) "name_bin",
CharacterSetName: (string) (len=7) "utf8mb4"
}
}
listB:
([]tables.collationData) (len=6) {
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=14) "enroll_secrets",
ColumnName: (string) (len=6) "secret",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
TableName: (string) (len=10) "fake_table",
ColumnName: (string) (len=9) "host_uuid",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
TableName: (string) (len=10) "fake_table",
ColumnName: (string) (len=12) "command_uuid",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=5) "hosts",
ColumnName: (string) (len=8) "node_key",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=5) "hosts",
ColumnName: (string) (len=14) "orbit_node_key",
CharacterSetName: (string) (len=7) "utf8mb4"
},
(tables.collationData) {
CollationName: (string) (len=11) "utf8mb4_bin",
TableName: (string) (len=5) "teams",
ColumnName: (string) (len=8) "name_bin",
CharacterSetName: (string) (len=7) "utf8mb4"
}
}
Test: TestCollation
--- FAIL: TestCollation (6.51s)
FAIL
coverage: 10.8% of statements in github.com/fleetdm/fleet/v4/...
FAIL github.com/fleetdm/fleet/v4/server/datastore/mysql/migrations/tables 279.380s
- Fix the migration to include collation on the table. CI should pass. https://github.com/fleetdm/fleet/actions/runs/15355351038/job/43213052584?pr=29637
2025/05/30 20:54:55 [2025-05-30] Create Fake Table
--- PASS: TestCollation (7.17s)
PASS
- Revise the migration to include collation on the column rather than the table. CI should pass. https://github.com/fleetdm/fleet/actions/runs/15355836597/job/43214576465?pr=29637
2025/05/30 21:29:25 [2025-05-30] Create Fake Table
--- PASS: TestCollation (7.03s)
PASS
- TODO: Regression testing for tables where default schema dump changed
FYI @jmwatts The test plan doesn't, but should, include regression/smoke testing for tables where our default schema dump changed from previously. @jahzielv should be able to show you which ones those are, and which functions they correspond to.
@iansltx is totally right. @jmwatts might have to wait till next week if that's ok?
@jahzielv 100% I'll need to wrap up 4.69.0 before I can pick those up anyhow. Thanks!
FYI @jmwatts The test plan doesn't, but should, include regression/smoke testing for tables where our default schema dump changed from previously. @jahzielv should be able to show you which ones those are, and which functions they correspond to.
@jahzielv is there an easy way to retrieve those tables so I can work on some regression testing around the features?
@jmwatts I'll get you the list asap
tables changed in the dump that should be regression-tested:
host_mdm_apple_bootstrap_packagesmdm_idp_accountsnano_devicesnano_enrollmentsqueriessoftware_categoriesusershost_mdm_apple_declarations
cc @jmwatts
Performed regression tests for workflows affecting these tables:
host_mdm_apple_bootstrap_packagesmdm_idp_accountsnano_devicesnano_enrollmentsqueriessoftware_categoriesusershost_mdm_apple_declarations
No new issues observed
Collation bugs depart, Migration paths made clear, bright, Fleet's course now charts right.