seldon-core
seldon-core copied to clipboard
Seldon-controller-manager doesn't delete SVC when renaming SeldonDeployment
Describe the bug
When renaming the predictor in an already deployed SeldonDeployment, the Seldon controller doesn't delete the previous SVC even if the news pods are running and healthy and the old pods are deleted.
When renaming the predictor, the seldon-controller-manager correctly deletes the old pods once the new ones have rolled out but doesn't delete either of the services. It results in two different routes where 50% of the requests end up failing.
The current way to solve this problem is to currently manually delete the route in Ambassador.
To reproduce
- Use Ambassador
- Deploy a Seldon model
- Rename the predictor in the SeldonDeployment and keep everything else the same\
- Deploy the SeldonDeployment again
- Run multiple requests (>100) and you will see that 50% of the requests are failing.
- You can check ambassador and you'll find two Envoy routes even though only one should be present.
Expected behaviour
One the new pods are running and healthy, we should delete the old SVC.
Environment
- Seldon-Core deployed on EKS running with Ambassador
- Kubernetes version: v1.21.11, also happened on other versions.
- Seldon system images: 1.13.0
Also reproduced by @edshee locally https://seldondev.slack.com/archives/C03DQFTFXMX/p1662022400574279
There is code here to remove old SVCs so need to see why this is not working with some examples.