flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Feat] add user annotations to k8s objects

Open ttitsworth-lila opened this issue 1 month ago • 32 comments

Why are the changes needed?

Presently when submitting workflows via Flyte, it's hard to know who submitted what. This PR addresses one of two concerns:

  1. I can't filter by users in the Flyte Console
  2. I can't see a user identifier in k8s

Why do I/we care about 2? Let's say you have two teams (A & B), if team A has access to 10 resources and team B has access to 1000 resources and we know who is in each team (exclusively). One can create further admission controls for any user in team A that requests more resources than are allotted to that team. Furthermore, this can be tracked properly with Kueue.

Today when I submit a Flyte workflow I have no way to actually know who submitted what from a k8s perspective because principal information is not passed to the downstream k8s flyteworkflow object.

What changes were proposed in this pull request?

This PR adds a parameter that enables a new annotation to be injected into each flyteworkflow object that contains the user principal, which usually carries oauth information. Additionally, it allows customization of the annotation prefix.

How was this patch tested?

I added some unit tests, and I also tested in production with the flyte-core chart on EKS and using Keycloak:

$ kubectl describe flyteworkflow -n flyte-workspace ax5hwpvbmwnkhdkmq67m | grep flyte.org
Annotations:  flyte.org/user: [email protected]

Labels

Please add one or more of the following labels to categorize your PR:

  • added: For new features.
  • changed: For changes in existing functionality.
  • deprecated: For soon-to-be-removed features.
  • removed: For features being removed.
  • fixed: For any bug fixed.
  • security: In case of vulnerabilities

This is important to improve the readability of release notes.

Setup process

Screenshots

Check all the applicable boxes

  • [x] I updated the documentation accordingly.
  • [x] All new and existing tests passed.
  • [x] All commits are signed-off.

Related PRs

Docs link

Summary by Bito

  • This pull request introduces a feature for injecting user annotations into Flyte workflows, enhancing visibility and accountability in multi-team environments, while also addressing potential risks related to data integrity and security in Kubernetes environments.
  • It updates the identity annotation prefix to 'flyte.org' and includes customization options for better tracking of user submissions.
  • The changes improve error handling in the execution manager, ensuring better resource management and security.
  • Configuration updates include new identity annotation fields and modifications to deployment configurations, aiding in resource management.
  • Unit tests have been added to ensure the functionality works as intended, validating the new features.
  • Overall, this update introduces new user annotation features, customization options, and enhancements in resource management and security.

ttitsworth-lila avatar Oct 28 '25 22:10 ttitsworth-lila

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

welcome[bot] avatar Oct 28 '25 22:10 welcome[bot]

Codecov Report

:x: Patch coverage is 87.87879% with 8 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 56.93%. Comparing base (99f2ddd) to head (2860a30).

Files with missing lines Patch % Lines
...kg/runtime/interfaces/application_configuration.go 0.00% 8 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6710      +/-   ##
==========================================
+ Coverage   56.91%   56.93%   +0.02%     
==========================================
  Files         929      929              
  Lines       58077    58139      +62     
==========================================
+ Hits        33052    33100      +48     
- Misses      21984    21998      +14     
  Partials     3041     3041              
Flag Coverage Δ
unittests-datacatalog 53.51% <ø> (ø)
unittests-flyteadmin 53.10% <87.87%> (+0.08%) :arrow_up:
unittests-flytecopilot 43.06% <ø> (ø)
unittests-flytectl 64.02% <ø> (ø)
unittests-flyteidl 75.71% <ø> (ø)
unittests-flyteplugins 60.13% <ø> (ø)
unittests-flytepropeller 53.54% <ø> (ø)
unittests-flytestdlib 63.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 29 '25 17:10 codecov[bot]

Hey @davidmirror-ops it looks like there's an issue with pulling helm on a bunch of the tests, is this expected?

I'm working on updating the code coverage to get things green.

ttitsworth-lila avatar Oct 29 '25 18:10 ttitsworth-lila

@ttitsworth-lila yes, that's still a limitation on the current version of CI. Could you run make helm locally and push the changes?

davidmirror-ops avatar Oct 30 '25 17:10 davidmirror-ops

@davidmirror-ops no problem, done.

ttitsworth-lila avatar Oct 30 '25 18:10 ttitsworth-lila

We do something similar internally since the principal is not passed from the control plane to the data plane, and thus can't be rendered into the k8s manifests in propeller. I will review this more deeply later.

Sovietaced avatar Nov 03 '25 18:11 Sovietaced

We do something similar internally since the principal is not passed from the control plane to the data plane, and thus can't be rendered into the k8s manifests in propeller. I will review this more deeply later.

@Sovietaced Thanks for the comment. I'd love to understand more about what you mean. Presently I'm testing this in a multi-cluster setup where my above method is able to pass that information. What IDP provider do you use internally?

ttitsworth-lila avatar Nov 03 '25 19:11 ttitsworth-lila

@Sovietaced Thanks for the comment. I'd love to understand more about what you mean. Presently I'm testing this in a multi-cluster setup where my above method is able to pass that information. What IDP provider do you use internally?

Not sure what you mean with regards to IDP. What I'm trying to say is that flyte admin (the control plane) has the identity information when the user calls the API to create an execution. None of that information is passed down into the workflow resource and as a result is unusable by the flyte propeller (the data plane). So like you've done here we inject some of that information into the workflow annotations which isn't super elegant, but it works.

Arguably the workflow spec should embed some information about the user identity in a more structured way.

Sovietaced avatar Nov 06 '25 03:11 Sovietaced

So like you've done here we inject some of that information into the workflow annotations which isn't super elegant, but it works. Arguably the workflow spec should embed some information about the user identity in a more structured way.

Like your suggestion @Sovietaced , but would require a lot more changes across the stack to include in the spec but the current PR can be considered incremental and adds a benefit for having identity info in the dataplane.

pmahindrakar-oss avatar Nov 07 '25 04:11 pmahindrakar-oss

Thanks for the review @pmahindrakar-oss, I just pushed a change to improve the coverage. I'm having some issues running helm make:

Save error occurred:  could not find : chart docker-registry not found in https://helm.twun.io/: looks like "https://helm.twun.io/" is not a valid chart repository or cannot be reached: Get "https://helm.twun.io/index.yaml": dial tcp: lookup helm.twun.io on 10.255.255.254:53: no such host

Hope this is acceptable, and would appreciate some help with landing this PR :)

ttitsworth-lila avatar Nov 07 '25 17:11 ttitsworth-lila

Thanks for the review @pmahindrakar-oss, I just pushed a change to improve the coverage. I'm having some issues running helm make:


Save error occurred:  could not find : chart docker-registry not found in https://helm.twun.io/: looks like "https://helm.twun.io/" is not a valid chart repository or cannot be reached: Get "https://helm.twun.io/index.yaml": dial tcp: lookup helm.twun.io on 10.255.255.254:53: no such host

Hope this is acceptable, and would appreciate some help with landing this PR :)

The DNS issue looks like possible company VPN / firewall not resolving approved domains

Sovietaced avatar Nov 07 '25 17:11 Sovietaced

The DNS issue looks like possible company VPN / firewall not resolving approved domains

fixed it here https://github.com/flyteorg/flyte/pull/6726

pingsutw avatar Nov 12 '25 06:11 pingsutw

Got some conflicts

Sovietaced avatar Nov 12 '25 18:11 Sovietaced

Bito Review Skipped - No Changes Detected

Bito didn't review this pull request because we did not detect any changes in the pull request to review.

flyte-bot avatar Nov 12 '25 22:11 flyte-bot

@machichima Thanks for the review! I think I've addressed all of your comments.

ttitsworth-lila avatar Nov 17 '25 16:11 ttitsworth-lila

cc @EngHabu @fg91 could you please also take a look at this? Thanks!

machichima avatar Nov 24 '25 09:11 machichima

There already is a feature to inject the user identity as a label :)

fg91 avatar Nov 24 '25 09:11 fg91

There already is a feature to inject the user identity as a label :)

@fg91 can you share the docs for that? Or just point me to where that is?

ttitsworth-lila avatar Nov 24 '25 16:11 ttitsworth-lila

There already is a feature to inject the user identity as a label :)

@fg91 can you share the docs for that? Or just point me to where that is?

https://github.com/flyteorg/flyte/pull/4637

This applies the label on the pod object though and I just realized you would like to have it on the flyteworkflow CR. Maybe we could reuse the same label though instead of adding new annotation with a different name?

fg91 avatar Nov 24 '25 17:11 fg91

There already is a feature to inject the user identity as a label :)

@fg91 can you share the docs for that? Or just point me to where that is?

#4637

This applies the label on the pod object though and I just realized you would like to have it on the flyteworkflow CR. Maybe we could reuse the same label though instead of adding new annotation with a different name?

Happy to consolidate to prevent duplication here, I'm just trying to understand how your PR works because I don't currently see anything related to labels.execution_identity, does this label only appear when you set the execution identity in the podtemplate?

ttitsworth-lila avatar Nov 24 '25 18:11 ttitsworth-lila

There already is a feature to inject the user identity as a label :)

@fg91 can you share the docs for that? Or just point me to where that is?

#4637 This applies the label on the pod object though and I just realized you would like to have it on the flyteworkflow CR. Maybe we could reuse the same label though instead of adding new annotation with a different name?

Happy to consolidate to prevent duplication here, I'm just trying to understand how your PR works because I don't currently see anything related to labels.execution_identity, does this label only appear when you set the execution identity in the podtemplate?

No, you don't need to set anything in the pod template 🤔 If I'm not completely mistaken, the only requirement is that flyteadmin needs to have auth enabled but I assume you have that?

fg91 avatar Nov 24 '25 18:11 fg91

There already is a feature to inject the user identity as a label :)

@fg91 can you share the docs for that? Or just point me to where that is?

#4637 This applies the label on the pod object though and I just realized you would like to have it on the flyteworkflow CR. Maybe we could reuse the same label though instead of adding new annotation with a different name?

Happy to consolidate to prevent duplication here, I'm just trying to understand how your PR works because I don't currently see anything related to labels.execution_identity, does this label only appear when you set the execution identity in the podtemplate?

No, you don't need to set anything in the pod template 🤔 If I'm not completely mistaken, the only requirement is that flyteadmin needs to have auth enabled but I assume you have that?

I went and verified that exact key in our values file, and yes we have auth enabled. Obviously we need the auth-flow to work for my changes to function as well. Chart flyte-core-v1.15.3. I'm wondering if we could get a third party like @machichima or @pmahindrakar-oss to help verify if this feature @fg91 introduced is functioning? I can only assume it's working for your team @fg91!

ttitsworth-lila avatar Nov 24 '25 19:11 ttitsworth-lila

There already is a feature to inject the user identity as a label :)

@fg91 can you share the docs for that? Or just point me to where that is?

#4637

This applies the label on the pod object though and I just realized you would like to have it on the flyteworkflow CR. Maybe we could reuse the same label though instead of adding new annotation with a different name?

On the topic of consolidation, it's not obvious to me the best way to consolidate since the scope of this PR grew to include more than just the identity email. If we do anything, we could just keep both for backwards compatibility and let the community decide how to proceed. Thoughts?

ttitsworth-lila avatar Nov 24 '25 19:11 ttitsworth-lila

I think this is mostly fine btw, but agreed with @Sovietaced - I feel like there should be a more cohesive story around how identity is injected. And also how it can be used by the various plugins after it's been injected into the wf CRD. Are email/sub the only things that will ever need to be injected? Will anyone ever want to inject an entire token (for further downstream auth by plugins themselves)?

wild-endeavor avatar Nov 26 '25 07:11 wild-endeavor

I feel like there should be a more cohesive story around how identity is injected.

Not all pieces of an identity should be injected IMO. The user deploying Flyte should have to explicitly choose what IDP labels they will expose to their propeller env(s).

Are email/sub the only things that will ever need to be injected?

From my perspective, yes. This is now giving contributors a pathway to add new labels for their IDP provider if they have their own use-case.

Will anyone ever want to inject an entire token (for further downstream auth by plugins themselves)?

I had not considered this as a valid method to auth with plugins, it seems like there are security implications with that approach. I am also unaware as to what plugins would benefit from this approach.

ttitsworth-lila avatar Nov 26 '25 16:11 ttitsworth-lila

@davidmirror-ops It looks like some of the tests are still failing with errors I'm unsure about, how do we want to proceed with this PR?

ttitsworth-lila avatar Nov 26 '25 16:11 ttitsworth-lila

Bito Review Skipped - No Changes Detected

Bito didn't review this pull request because we did not detect any changes in the pull request to review.

flyte-bot avatar Nov 26 '25 17:11 flyte-bot

@davidmirror-ops It looks like some of the tests are still failing with errors I'm unsure about, how do we want to proceed with this PR?

I think you need to make helm again and push. Also it seems like there's some lint error, you can run make lint locally to fix. Seems like tests are passed. Feel free to ping me after you finished, and I can re-trigger the workflow for you.

And I'll look into the PR that @fg91 mentioned, and see if we need to modify this PR to reuse the label. Thank you!

machichima avatar Dec 06 '25 01:12 machichima

I've tested out the PR that @fg91 mentioned (https://github.com/flyteorg/flyte/pull/4637). I think it's a bit different from what we want to do here.

Update: I can see the execution-identity label on my side, which only shows the subject. I think for showing different info other than subject (e.g. email), the current approach looks better for me.

image

What #4637 did is to add the Secret set in the task to annotations. I tested with following flytekit script:

from flytekit import Secret, task, workflow

@task(
    secret_requests=[
        Secret(group="testing-secret-1", key="username", mount_requirement=Secret.MountType.ENV_VAR),
        Secret(group="testing-secret-2", key="api-token"),
    ]
)
def my_task_using_secrets():
    pass

@workflow
def wf():
    my_task_using_secrets()

And the pod created will have the label and annotation. The flyte.secrets/s0 value is the encoded string for the Secret setting we provided in the task

❯ k describe pod a7qswm5zm79mp6t25jz2-n0-0 -n flytesnacks-development
Name:             a7qswm5zm79mp6t25jz2-n0-0
Namespace:        flytesnacks-development
Priority:         0
Service Account:  default
Node:             cdb78fa419be/192.168.215.3
Start Time:       Mon, 08 Dec 2025 15:54:30 +0800
Labels:           domain=development
                  execution-id=a7qswm5zm79mp6t25jz2
                  inject-flyte-secrets=true   # <- here
                  interruptible=false
                  node-id=n0
                  project=flytesnacks
                  shard-key=27
                  task-name=test-secret-my-task-using-secrets
                  workflow-name=test-secret-wf
Annotations:      cluster-autoscaler.kubernetes.io/safe-to-evict: false
                  # and following two flyte.secrets annotations              
                  flyte.secrets/s0: m4zg54lqhiqce4dfon1gs2thfvzwky2smv1c1mjcbjvwk5j1earhk32fojxgc2lfeifg122vnz1f53tfof1ws3tfnvsw34b1ebcu3vs6kzavecq
                  flyte.secrets/s1: m4zg54lqhiqce4dfon1gs2thfvzwky2smv1c1mrcbjvwk5j1eargc3djfv1g512fnyrau
                  primary_container_name: a7qswm5zm79mp6t25jz2-n0-0

cc @fg91 to confirm if it's correct? In this case I think it's ok to keep this PR as is. WDYT?

machichima avatar Dec 08 '25 07:12 machichima

@machichima I ran make helm just fine, but make lint didn't work. I did a bit of looking in the docs and files to see if I'm missing anything, but I just did a fresh clone to confirm that for whatever reason make lint fails on the current default branch:

$ make lint
Makefile:130: warning: overriding recipe for target 'go-tidy'
boilerplate/flyte/golang_test_targets/Makefile:59: warning: ignoring old recipe for target 'go-tidy'
make[1]: Entering directory '/home/user/flyte'
make[1]: *** /flytestdlib: No such file or directory.  Stop.
make[1]: Leaving directory '/home/user/flyte'
make: *** [boilerplate/flyte/golang_test_targets/Makefile:10: download_tooling] Error 2

Editing the path unraveled the error a little bit more, but I'm just curious if I missed an obvious step. So I instead just ran golangci-lint directly and found some linting errors in the unrelated cmd directory.

INFO [runner] linters took 29.799047289s with stages: goanalysis_metalinter: 29.796747133s 
cmd/single/console.go:29:6: type consoleFS is unused (unused)
type consoleFS struct {
     ^
cmd/single/console.go:35:20: func consoleFS.Open is unused (unused)
func (f consoleFS) Open(name string) (http.File, error) {
                   ^
cmd/single/without_console_dist.go:11:5: var console is unused (unused)
var console fs.FS
    ^
3 issues:
* unused: 3

ttitsworth-lila avatar Dec 08 '25 18:12 ttitsworth-lila