seldon-core
seldon-core copied to clipboard
Operator reconciliation fails when setting fractional milli resource requests
Describe the bug
When setting a resource request of 1.5m or any partial millicpu request, the operator manager fails to reconcile the deployment.
To reproduce
- Create a deployment with a custom model container image
- Set the resource request to a fractional millicpu
- Deploy
The operator manager cannot reconcile, and generates millions of logs per day. I know we are not supposed to set fractional millicpu, but k8s does not fail if you do so. It rounds the value up to nearest integer. I suspect this is where the reconciliation error comes from and the operator sees a difference between the desired state: 1.5m and the actual state 2m. And tries to reconcile, without success, generating many logs.
Expected behaviour
Although we were not supposed to set fractional millicpus, it took a while to find the root cause of the many logs. I expect: either the operator to return an error to disallow fractional millicpu values, or also silently round the value up to the nearest integer, like k8s does by default.
Environment
AWS