metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Long argo-events sensor names can cause sensor deployments to fail

Open shrinandj opened this issue 2 years ago • 2 comments

Currently, sensor names are created by replacing . with - in the argo-workflow names.

            # Register sensor. Unfortunately, Argo Events Sensor names don't allow for
            # dots (sensors run into an error) which rules out self.name :(
            # Metaflow will overwrite any existing sensor.
            sensor_name = self.name.replace(".", "-")
            if self._sensor:
                argo_client.register_sensor(sensor_name, self._sensor.to_json())

Subsequently, argo-events creates a deployment for each sensor. This deployment's name is created by adding a suffix to the sensor name.

If the sensor name is <= 63 characters, things seem to work fine. However, if the sensor name > 63 characters, the deployment object fails to get created:

{"level":"error","ts":1697213357.2393486,"logger":"argo-events.eventbus-controller","caller":"sensor/controller.go:74","msg":"reconcile error","namespace":"jobs-default","sensor":"abcreditscorev4d4mx-test-pqrsmnbv44testrun-candidatemodelevaluation","error":"Deployment.apps \"abcreditscorev4d4mx-test-pqrsmnbv44testrun-candidatemodelemmc5m\" is invalid: [metadata.labels: Invalid value: \"nbcreditscorev4d4mx-test-annamnbv44testrun-candidatemodelevaluation\": must be no more than 63 characters, spec.selector.matchLabels: Invalid value: \"abcreditscorev4d4mx-test-pqrsmnbv44testrun-candidatemodelevaluation\": must be no more than 63 characters, spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"controller\":\"sensor-controller\", \"owner-name\":\"abcreditscorev4d4mx-test-pqrsmnbv44testrun-candidatemodelevaluation\", \"sensor-name\":\"abcreditscorev4d4mx-test-pqrsmnbv44testrun-candidatemodelevaluation\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: invalid label selector]","stacktrace":"github.com/argoproj/argo-events/controllers/sensor.(*reconciler).Reconcile\n\t/home/runner/work/argo-events/argo-events/controllers/sensor/controller.go:74\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

This error shows up in the argo-events controller. Metaflow itself doesn't see this error and therefore the user feels like everything is working just fine. So, it's hard to find.

shrinandj avatar Oct 16 '23 17:10 shrinandj

Is there a specific reason for the sensor names to match argo-workflow names? If not, we could simply name the sensor as s-[hash-of-argo-workflow-name] and add the workflow name as a label on the sensor object.

shrinandj avatar Oct 16 '23 17:10 shrinandj

I think this is an efficient way to do this, since adding the workflowname as a label on to the object maintains the naming conventions for the sensor names and adds an additional layer of metadata for organization. Created a PR #1597 for this if you would like to check it out and provide some feedback--I'd appreciate it a lot!

mikesteroonie avatar Oct 18 '23 03:10 mikesteroonie