fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Inconsistent handling of all/no/specific team specification in database schema

Open iansltx opened this issue 10 months ago • 13 comments

Currently from what I can tell we have two ways of differentiating between resources that:

  • Are associated with a specific team
  • Are associated with "No team"
  • Are global (or "fallback")

nullable no-fkey team_id

  • Non-null nonzero for a specific team

  • Zero for No team

  • Null for all teams

  • Via a stored generated column, can enforce uniqueness (by mapping null to a placeholder value) or foreign key constraints including on-delete cascades (by mapping 0 to null)

  • Used with "null is equivalent to all teams" semantics in policies

  • Used without "no team" support in queries

  • Used with "null is equivalent to no team" semantics in enroll_secrets and hosts

null_team_type enum + nullable team_id

  • Enum has "none", "allteams", "noteam" options
  • team_id has both uniqueness constraint and on-delete cascade
  • Used in vpp_token_teams

We should decide which mechanism to use for new global/no-team/specific-team resources going forward, and potentially revise existing schemas as we touch them for consistency since there are a limited number of resources that behave this way (three by my count, as hosts and enroll secrets actually mean "no team" when they say "null team")


A note on global_or_team_id

This convention is only used for resources that belong to either a specific team or "No team", so it's actually inapplicable to this ticket, but I'm including it as I incorrectly brought it up in this Slack thread.

  • global_or_team_id column is not-null int, can be used in a uniqueness constraint
  • team_id column is a nullable int, can have a foreign key relation

Used in the following places:

  • vpp_apps_teams
  • mdm_apple_default_setup_assistants
  • mdm_apple_default_setup_assistants
  • scripts
  • setup_experience_scripts
  • software_installers

iansltx avatar Jan 07 '25 05:01 iansltx

@iansltx Great write-up of the various approaches, thanks! IIUC there is no table that currently uses the nullable team_id with 0 for "No team"? Or maybe just policies, as the other tables you mention have a real FK on teams and can't use 0?

If so, I think there's even less reasons to use this approach as it's confusing as to which special value means all/no team, and it would require (unused today) a generated column for the FK to work and the FK has significant limitations (those all make sense, but it means ~~e.g. you can't delete the row if the team got deleted, which is usually the main reason for the FK on teams~~ oops read that too fast, seems like you can delete cascade, but not e.g. SET NULL):

A foreign key constraint on a stored generated column cannot use CASCADE, SET NULL, or SET DEFAULT as ON UPDATE referential actions, nor can it use SET NULL or SET DEFAULT as ON DELETE referential actions. A foreign key constraint on the base column of a stored generated column cannot use CASCADE, SET NULL, or SET DEFAULT as ON UPDATE or ON DELETE referential actions. A foreign key constraint cannot reference a virtual generated column.

I'm not sure if I'll be able to make it to backend sync today, but my 0.02$ would be to use the ENUM + nullable team_id as we did in vpp_token_teams, but with some improvements mentioned below. Benefits in my opinion are:

  • Standard FK on teams that can delete the row (or SET NULL) if the team is deleted
  • Clearer as to what special values are, thanks to the enum - and team_id is always either NULL or a valid id
  • Now that we use mysql 8+, we could probably (untested) add CHECK constraints that are actually enforced to ensure that the team_id/enum pairs are valid (i.e. that all/no team enum can only be set if team_id is null) .
  • The con of this approach is that we can't use a (correct) unique index because team_id is NULLable, but we could use the generated column approach to have such a unique index by converting the team_id to 0 when NULL (along with the ENUM, this constitues a valid unique index that can't have duplicates due to NULLs).

With all this in place, I don't think there are many downsides to this approach?

mna avatar Jan 07 '25 13:01 mna

@mna

Re: existing table usage, as noted in the original ticket description, policies and queries both use the "null for all teams, zero for no team, nonzero for another team" convention.

Re: generated columns and cascades, per what you pasted we can cascade-delete rows based on the generated column for the cases that matter (no-team never gets deleted, nor does all-teams).

Re: downsides to the enum method, it's possible to have redundant/invalid combinations of team ID and the enum value: either "none" with a null team ID or a value other than "none" with a non-null team ID. We can throw a check constraint around this to prevent invalid combinations...or we could pick a schema that doesn't have invalid combinations of options built in.

iansltx avatar Jan 07 '25 14:01 iansltx

@iansltx

Re: existing table usage, as noted in the original ticket description, policies and queries both use the "null for all teams, zero for no team, nonzero for another team" convention.

AFAICS that can't be the case for queries as it has an FK on teams so it cannot use invalid value 0. Maybe it uses yet another approach, as I see it has a team_id_char column, not sure what that is for.

Re: downsides to the enum method, it's possible to have redundant/invalid combinations of team ID and the enum value

As I mentioned, we can use check constraints for that.

we could pick a schema that doesn't have invalid combinations of options built in

Without the generated column and FK on it, the nullable team_id has ~infinite invalid values :D It requires the generated column to prevent this, and at that point I'm not sure adding CHECK vs adding a generated column is more work. Personally I like the explicit approach of all/no team with the ENUM vs remembering if 0 or null must be used.

mna avatar Jan 07 '25 14:01 mna

Re: queries, sorry, you're correct; we can't currently scope queries to only "No team", so this winds up being "do we use the implementation in policies or do we use the implementation in the VPP token team association table".

To be clear, I haven't built/modified either implementation so this isn't me advocating for the pattern I built.

iansltx avatar Jan 07 '25 14:01 iansltx

I haven't built/modified either implementation so this isn't me advocating for the pattern I built.

Neither did I FWIW (although I did create the initial global_or_team_id pattern). I don't think it would matter anyway.

this winds up being "do we use the implementation in policies or do we use the implementation in the VPP token team association table"

Kinda, but I think we both want to improve a bit on the existing patterns. Since it looks like both can handle cascading deletes and uniqueness for a team/no/all teams, some reasons why I'd still prefer the enum approach:

  • Explicit, discoverable pattern vs needs to know what value means what
  • No restrictions on FK behaviour if we ever need eg. SET NULL
  • Expandable if, god forbid, we should add another no-team-id semantic to "All teams" and "No team"

That being said, it's not like one's a terrible approach and the other is obviously better, so just to reiterate, that's just my 0.02$ and I'm fine with whatever the team decides to go with.

mna avatar Jan 07 '25 15:01 mna

Per backend sync, action item for me is to build a POC for the team_id + generated columns approach, which incidentally could be applied to cleaning up (among other things) global_or_team_id back into one manually edited column plus one generated column, with the manually edited column borrowing the semantics of the global_or_team_id column (because we don't need null there) and the nullable fkey-able column being generated + stored.

To avoid this work causing features/bugs to drop out of the upcoming release, I'm planning on having the proof of concept for not the next backend sync (which is on the day of the RC cut) but the one after (2nd day of the 4.64 sprint), so I can work on this at the end of the sprint after the release has been cut.

iansltx avatar Jan 07 '25 18:01 iansltx

Estimate for this is for the proof of concept. New plan is to work on this as the eng initiated issue during on-call in two weeks.

iansltx avatar Jan 21 '25 14:01 iansltx

moving through eng initiated pipeline

mostlikelee avatar Jan 22 '25 21:01 mostlikelee

Thanks, since this is estimated I'm moving straight to the release board. I know the inconsistency here has been confusing and resulted in some bugs.

The important thing with a pattern change like this is to make sure all backend contributors are on board so the pattern is adopted across engineering.

lukeheath avatar Jan 22 '25 23:01 lukeheath

Yep. Intent for this rev is to confirm my approach is solid once the proof of concept proves the concept, and then go from there once folks across teams give a thumbs up :)

iansltx avatar Jan 22 '25 23:01 iansltx

@mostlikelee I see this has been in estimated for 2 months just wanted to see if this is still a priority.

lukeheath avatar Mar 31 '25 17:03 lukeheath

I'm not Tim but since this isn't on a roadmap and isn't a bug it's not going to get prioritized unless space is carved out in my schedule for eng-init work. Given that my next on-call is during the off-site it won't get worked on then either :/

iansltx avatar Mar 31 '25 17:03 iansltx

since this isn't on a roadmap and isn't a bug it's not going to get prioritized

This has been the case as we've been racing towards OKRs at the end of the quarter. Can we assign this to the next/current on-call backend engineer?

mostlikelee avatar Apr 02 '25 22:04 mostlikelee

@mostlikelee or @iansltx is this something that will need a test plan (i.e. will there be code changes that could result in a change of functionality or potential regressions)? Just want to make sure even if this is picked up by an on-call backend engineer that whoever QA's it has a decent test plan to work with. Thanks!

jmwatts avatar Apr 08 '25 19:04 jmwatts

@jmwatts Short answer: yes. Functionality should be the same, but would need to regression test adding items to No team, a specific team, or (for queries) all teams, running whatever migration we have as a result of this work, then confirming resources are still on the same team as before. Likewise, should smoke test adding resources to a specific team/no team/all teams and confirming they land where they're supposed to.

This should all be well-covered by automated tests, but the above is the short version of a relevant test plan.

iansltx avatar Apr 08 '25 21:04 iansltx

@mostlikelee Let's push this to next sprint. We want to focus on stability in 4.69.0 vs optimization (that may cause instability).

lukeheath avatar May 01 '25 21:05 lukeheath

@ksykulev a possible on-call task

mostlikelee avatar Jul 28 '25 15:07 mostlikelee

Hey @lukeheath, is this a bug, story, or timebox? We want to label this one.

marko-lisica avatar Nov 06 '25 15:11 marko-lisica

@marko-lisica Sorry, accidental add to drafting. Removed.

lukeheath avatar Nov 06 '25 15:11 lukeheath