seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Seldon-controller-manager doesn't delete SVC when renaming SeldonDeployment

Open stephen37 opened this issue 3 years ago • 1 comments

Describe the bug

When renaming the predictor in an already deployed SeldonDeployment, the Seldon controller doesn't delete the previous SVC even if the news pods are running and healthy and the old pods are deleted.

When renaming the predictor, the seldon-controller-manager correctly deletes the old pods once the new ones have rolled out but doesn't delete either of the services. It results in two different routes where 50% of the requests end up failing.

The current way to solve this problem is to currently manually delete the route in Ambassador.

To reproduce

  1. Use Ambassador
  2. Deploy a Seldon model
  3. Rename the predictor in the SeldonDeployment and keep everything else the same\
  4. Deploy the SeldonDeployment again
  5. Run multiple requests (>100) and you will see that 50% of the requests are failing.
  6. You can check ambassador and you'll find two Envoy routes even though only one should be present.

Expected behaviour

One the new pods are running and healthy, we should delete the old SVC.

Environment

  • Seldon-Core deployed on EKS running with Ambassador
  • Kubernetes version: v1.21.11, also happened on other versions.
  • Seldon system images: 1.13.0

Also reproduced by @edshee locally https://seldondev.slack.com/archives/C03DQFTFXMX/p1662022400574279

stephen37 avatar Sep 01 '22 14:09 stephen37

There is code here to remove old SVCs so need to see why this is not working with some examples.

ukclivecox avatar Sep 05 '22 05:09 ukclivecox