flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Upgrade Sagemaker integration to use sagemaker-controller

Open hajapy opened this issue 2 years ago • 4 comments

Motivation: Why do you think this is important?

Flyte's Sagemaker integration uses the sagemaker-operator-for-k8s which has not seen much activity and feature enhancements since May 25, 2021. The newer sagemaker-controller is more feature rich and appears more actively maintained. Recent comments indicate this is the intended path forward. Upgrading to this newer controller would enable the sagemaker plugin to perform more tasks and provide a more complete API for the existing tasks (eg. associating a TrainingJob with a Sagemaker Experiment).

Goal: What should the final outcome look like, ideally?

Ideally, for an end user, existing workflows that leverage flytekitplugins.awssagemaker should continue to work. For administrators it should be possible to switch from using the existing operator to the new controller. New functionality would require the new controller to work, so it should be somewhat clear as a workflow author if the workflow they are creating is supported by their flyte installation in AWS.

Describe alternatives you've considered

It might be possible to write a more low-level plugin that doesn't rely on an AWS-provided k8s controller/operator, but instead leverages the go aws sdk directly. As that would be a more extensive rewrite and potentially harder to keep up to date, it seems preferable to pick a solution that is less disruptive to the current state. The new sagemaker-controller repo is heavily automated in terms of keeping it up to date with AWS api changes, as part of the AWS Controllers for Kubernetes (ACK) project.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

hajapy avatar Jul 26 '22 21:07 hajapy

Thank you for opening your first issue here! 🛠

welcome[bot] avatar Jul 26 '22 21:07 welcome[bot]

@hajapy thank you for the great find and I think this would be the right path forward. Do you have a sense for the priority of this? how much of a blocker is this?

kumare3 avatar Jul 26 '22 21:07 kumare3

It seems like it would open doors in terms of what type of workflows users could author that leverage more of Sagemaker while still using flyte to orchestrate and run parts of the workflows elsewhere. They could more easily create/update endpoints, ensure training jobs are tracked as parts of Experiments, etc.

Have a look at all the CRDs it supports: https://github.com/aws-controllers-k8s/sagemaker-controller/tree/main/config/crd/bases

hajapy avatar Jul 26 '22 22:07 hajapy

@hajapy this indeed looks really good. How should we proceed. The community prefers if there are volunteer organizations that will use the plugin so that we can ensure that we are developing for a user and it will remain maintained. Also this is incase you want to develop a backend plugin. Remember it is always possible to write this simply in python and make it a flytekit only plugin, but I do agree a backend plugin for sagemaker makes the most sense.

kumare3 avatar Aug 01 '22 05:08 kumare3

By the way, AWS has made the deprecation of the original sagemaker-operator official: https://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-operators-eos-announcement.html

End of technical support for sagemaker-operator is slated for Feb 15, 2023

hajapy avatar Jan 17 '23 14:01 hajapy

@hajapy thank you for the heads up. I think the new RFC might be a better approach. Cc @pingsutw

kumare3 avatar Jan 23 '23 15:01 kumare3

Is there any news on this? SageMaker operator for k8s has been deprecated for almost half a year now

j-adamczyk avatar Jul 05 '23 13:07 j-adamczyk

We've officially removed the deprecated sagemaker integration, in its place we're working on a Sagemaker agent.

eapolinario avatar Dec 07 '23 18:12 eapolinario