metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

address name contraction issues in argo-workflows create

Open savingoyal opened this issue 1 year ago • 1 comments

remaining items -

  1. enable --name for long names for CLI args except create
  2. check if sanitize_for_argo needs fixing
  3. moar testing
  4. comms

savingoyal avatar Oct 06 '24 20:10 savingoyal

closes #1818 , #1608

savingoyal avatar Oct 06 '24 20:10 savingoyal

rebased.

tested that suspend/unsuspend/terminate work as expected even with both old and new deployments running side-by-side (which would only be the case if someone deploys and triggers an old version after a new one)

tested that old buggy sensors with too long names in them properly get replaced with the new deployment, and start working as intended.

saikonen avatar Jul 21 '25 10:07 saikonen

regarding using --name as part of create, do we want to raise an exception if the name is too long? or should we instead apply the same truncation as with project based names? this will only affect the argo-workflows side of things, and not the actual flow names.

As I see it, there are two approaches to specifying long names via --name

  1. truncate the deployment name same as with long project names. flows would deploy no matter what.
  2. fail the argo-workflows --name very-long-name create and expect the user to adhere to argo-workflows naming standards

approach 1 has the UX trade-off that flows will deploy no matter the name length, but if the user is expecting their very-long-but-descriptive names to be present on the argo side, this will not be the case.

approach 2 is very disruptive, blocking the usage due to platform limitations that may not be of concern to the user.

decided to go with approach 1 for now. see https://github.com/Netflix/metaflow/pull/2082/commits/563a8fd6f881a476f95b296070fb5d16cde6786d for details


Edit: after discussions, we decided that approach 2 is the preferred direction due to users most likely being intentional in their naming, and truncating this would unexpectedly end up losing information intended to be encoded in the deployment.

see https://github.com/Netflix/metaflow/pull/2082/commits/d06ca78cdac0e1f1fdba98f3dafe431d4bad2b04 for implementation.

saikonen avatar Jul 21 '25 12:07 saikonen

regarding possible changes to sanitize_for_argo and its surjectivity in order to not have collisions between deployed flow names, this seems somewhat of a tricky issue.

  1. if we stick to simply replacing offending characters with accepted ones, then the output of sanitize_for_argo will be an example of a distinct (in case any replacing was done) flow name that would map to the same output
  2. appending a hash from the original name is also a possible solution. this would restrict the problem to hash collisions, but increasing the names length due to sanitizing can end up in the same name-length issues, so we would have to truncate it after the fact in some cases

saikonen avatar Jul 23 '25 14:07 saikonen