fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Validate column collation for new migrations in CI

Open iansltx opened this issue 9 months ago • 7 comments

Goal

User story
As a Fleet contributor,
I want to have CI validate that proper collations were added on new migrations
so that I can avoid creating issues for customers when they perform database migrations.

Key result

Preventing customer database/collation related bugs from happening.

Context

  • Product Designer: @iansltx

We've done one-off collation fix migrations (most recently in #25353), and the most recent migration (as of creation of this issue) sets default collation database-wide, but if a DB is restored from a backup the collation default won't get added in, so implicitly defined columns will get collated wrong.

Changes

Engineering

  • [x] Test plan is finalized
  • [ ] Contributor API changes: No changes.
  • [ ] CI changes: Add to the database check CI action a check that simulates migrating into a database without a DB-level collation default, and fail the check if the migration results include wrong-collation/wrong-charset string columns.

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

  • Risk level: Low

Test plan

  1. On a branch, add a database migration that creates a table with a VARCHAR column that doesn't specify a collation at either the table or the column level.
  2. Create a pull request so CI runs. CI should fail.
  3. Fix the migration to include collation on the table. CI should pass.
  4. Revise the migration to include collation on the column rather than the table. CI should pass.

Testing notes

Confirmation

  1. [ ] Engineer: Added comment to user story confirming successful completion of test plan.
  2. [ ] QA: Added comment to user story confirming successful completion of test plan.

iansltx avatar Feb 18 '25 02:02 iansltx

Needs to be updated per create an engineering-initiated story process.

lukeheath avatar Mar 31 '25 19:03 lukeheath

@iansltx Thanks for updating. This has bitten us several times, resulting in critical bugs and unplanned work. I'm prioritizing to the drafting board and assigning back to you.

lukeheath avatar Apr 08 '25 19:04 lukeheath

@jmwatts I'm guessing this test plan is too short for your confort; let me know what you need from me to fill this out. FWIW #26670 includes a bunch of examples (basically any file other than server/datastore/mysql/migrations/tables/20250127162751_AddUnifiedQueueTable.go or server/datastore/mysql/migrations/tables/20250127162751_AddUnifiedQueueTable_test.go) of migrations that, prior to that PR, didn't correctly specify collations where they should have, so they can be used as inspiration for the "this should break the build" migration.

iansltx avatar Apr 09 '25 05:04 iansltx

@iansltx if the fix is to make CI fail when there's a new migration that adds a column where collation isn't specified, and CI passes if the collation is set appropriately, then this test plan is probably sufficient.

Is this a change in CI, not the product? Would this cause any issues if a person needed to create a new table as part of a patch on an older release (hypothetical but not impossible)?

I'm assuming there are no circumstances for which we wouldn't want to set collation when we're adding a new table/column in a migration. Is it correct to assume that the output of the failure will be easy enough to understand and identify what the problem is? Thanks for pointing out the examples too!

jmwatts avatar Apr 09 '25 13:04 jmwatts

@jmwatts

Is this a change in CI, not the product?

Correct.

Would this cause any issues if a person needed to create a new table as part of a patch on an older release (hypothetical but not impossible)?

It would not.

  1. Backports wouldn't include this CI code.
  2. Backports should enforce proper collation anyway, the preferred collation hasn't changed in years, and migrations to correct existing collations wouldn't fail this test.
  3. As part of #26670 I retconned existing migrations so new installs would be in the proper state from the get-go, on top of the "where collation is broken, fix it" addition to the unified queue migration. So for anything that we don't have to backport we're just dealing with net-new issues.

I'm assuming there are no circumstances for which we wouldn't want to set collation when we're adding a new table/column in a migration.

There are exceptions, but they (search server/datasroe/mysql/shcema.sql for COLLATE utf8mb4_bin) are rare enough that it's reasonable to ask the dev to add an entry to an exception list on the CI job for those exceptions. The exceptions are rare because collation only affects string columns; numbers don't have the collation problem.

Is it correct to assume that the output of the failure will be easy enough to understand and identify what the problem is?

Yes.


If the above answers your questions, sufficiently, net of any test plan tweaks you want to make (feel free to make 'em), feel free to bump this to User Story Review :)

iansltx avatar Apr 09 '25 17:04 iansltx

it's reasonable to ask the dev to add an entry to an exception list on the CI job for those exceptions.

@iansltx Will this be documented somewhere such that if a dev runs into this failure they'll know what to do?

Other than that one question, looks good, so I'll move it over.

jmwatts avatar Apr 09 '25 17:04 jmwatts

@jmwatts Yep, "or add an exception to X file" should be part of the error message that's shown in CI.

iansltx avatar Apr 09 '25 17:04 iansltx

Hey team! Please add your planning poker estimate with Zenhub @jahzielv @ksykulev

iansltx avatar Apr 15 '25 20:04 iansltx

I've made progress on this, but moving to Waiting to work on https://github.com/fleetdm/fleet/issues/28815 as that needs to go out in 4.68.0. I'll be back to this one afterwards.

cc @mostlikelee

jahzielv avatar May 15 '25 15:05 jahzielv

QA Notes (in progress)

  1. On a branch, add a database migration that creates a table with a VARCHAR column that doesn't specify a collation at either the table or the column level.
  2. Create a (draft) pull request so CI runs. CI should fail.

https://github.com/fleetdm/fleet/actions/runs/15354969968/job/43211904370?pr=29637

Result:

2025/05/30 20:26:54 [2025-05-30] Create Fake Table
    migration_test.go:185: 
        	Error Trace:	/home/runner/work/fleet/fleet/server/datastore/mysql/migrations/tables/migration_test.go:185
        	            				/home/runner/work/fleet/fleet/server/datastore/mysql/migrations/tables/collation_test.go:30
        	Error:      	elements differ
        	            	
        	            	extra elements in list B:
        	            	([]interface {}) (len=2) {
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
        	            	  TableName: (string) (len=10) "fake_table",
        	            	  ColumnName: (string) (len=9) "host_uuid",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
        	            	  TableName: (string) (len=10) "fake_table",
        	            	  ColumnName: (string) (len=12) "command_uuid",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 }
        	            	}
        	            	
        	            	
        	            	listA:
        	            	([]tables.collationData) (len=4) {
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=14) "enroll_secrets",
        	            	  ColumnName: (string) (len=6) "secret",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=5) "hosts",
        	            	  ColumnName: (string) (len=8) "node_key",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=5) "hosts",
        	            	  ColumnName: (string) (len=14) "orbit_node_key",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=5) "teams",
        	            	  ColumnName: (string) (len=8) "name_bin",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 }
        	            	}
        	            	
        	            	
        	            	listB:
        	            	([]tables.collationData) (len=6) {
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=14) "enroll_secrets",
        	            	  ColumnName: (string) (len=6) "secret",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
        	            	  TableName: (string) (len=10) "fake_table",
        	            	  ColumnName: (string) (len=9) "host_uuid",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=18) "utf8mb4_0900_ai_ci",
        	            	  TableName: (string) (len=10) "fake_table",
        	            	  ColumnName: (string) (len=12) "command_uuid",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=5) "hosts",
        	            	  ColumnName: (string) (len=8) "node_key",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=5) "hosts",
        	            	  ColumnName: (string) (len=14) "orbit_node_key",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 },
        	            	 (tables.collationData) {
        	            	  CollationName: (string) (len=11) "utf8mb4_bin",
        	            	  TableName: (string) (len=5) "teams",
        	            	  ColumnName: (string) (len=8) "name_bin",
        	            	  CharacterSetName: (string) (len=7) "utf8mb4"
        	            	 }
        	            	}
        	Test:       	TestCollation
--- FAIL: TestCollation (6.51s)
FAIL
coverage: 10.8% of statements in github.com/fleetdm/fleet/v4/...
FAIL	github.com/fleetdm/fleet/v4/server/datastore/mysql/migrations/tables	279.380s
  1. Fix the migration to include collation on the table. CI should pass. https://github.com/fleetdm/fleet/actions/runs/15355351038/job/43213052584?pr=29637
2025/05/30 20:54:55 [2025-05-30] Create Fake Table
--- PASS: TestCollation (7.17s)
PASS
  1. Revise the migration to include collation on the column rather than the table. CI should pass. https://github.com/fleetdm/fleet/actions/runs/15355836597/job/43214576465?pr=29637
2025/05/30 21:29:25 [2025-05-30] Create Fake Table
--- PASS: TestCollation (7.03s)
PASS
  1. TODO: Regression testing for tables where default schema dump changed

jmwatts avatar May 30 '25 21:05 jmwatts

FYI @jmwatts The test plan doesn't, but should, include regression/smoke testing for tables where our default schema dump changed from previously. @jahzielv should be able to show you which ones those are, and which functions they correspond to.

iansltx avatar May 30 '25 21:05 iansltx

@iansltx is totally right. @jmwatts might have to wait till next week if that's ok?

jahzielv avatar May 30 '25 21:05 jahzielv

@jahzielv 100% I'll need to wrap up 4.69.0 before I can pick those up anyhow. Thanks!

jmwatts avatar May 30 '25 21:05 jmwatts

FYI @jmwatts The test plan doesn't, but should, include regression/smoke testing for tables where our default schema dump changed from previously. @jahzielv should be able to show you which ones those are, and which functions they correspond to.

@jahzielv is there an easy way to retrieve those tables so I can work on some regression testing around the features?

jmwatts avatar Jun 13 '25 16:06 jmwatts

@jmwatts I'll get you the list asap

jahzielv avatar Jun 13 '25 16:06 jahzielv

tables changed in the dump that should be regression-tested:

  • host_mdm_apple_bootstrap_packages
  • mdm_idp_accounts
  • nano_devices
  • nano_enrollments
  • queries
  • software_categories
  • users
  • host_mdm_apple_declarations

cc @jmwatts

jahzielv avatar Jun 13 '25 20:06 jahzielv

Performed regression tests for workflows affecting these tables:

  • host_mdm_apple_bootstrap_packages
  • mdm_idp_accounts
  • nano_devices
  • nano_enrollments
  • queries
  • software_categories
  • users
  • host_mdm_apple_declarations

No new issues observed

jmwatts avatar Jun 18 '25 17:06 jmwatts

Collation bugs depart, Migration paths made clear, bright, Fleet's course now charts right.

fleet-release avatar Jun 30 '25 23:06 fleet-release