seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Operator reconciliation fails when setting fractional milli resource requests

Open bcvanmeurs opened this issue 1 year ago • 0 comments
trafficstars

Describe the bug

When setting a resource request of 1.5m or any partial millicpu request, the operator manager fails to reconcile the deployment.

To reproduce

  1. Create a deployment with a custom model container image
  2. Set the resource request to a fractional millicpu
  3. Deploy

The operator manager cannot reconcile, and generates millions of logs per day. I know we are not supposed to set fractional millicpu, but k8s does not fail if you do so. It rounds the value up to nearest integer. I suspect this is where the reconciliation error comes from and the operator sees a difference between the desired state: 1.5m and the actual state 2m. And tries to reconcile, without success, generating many logs.

Expected behaviour

Although we were not supposed to set fractional millicpus, it took a while to find the root cause of the many logs. I expect: either the operator to return an error to disallow fractional millicpu values, or also silently round the value up to the nearest integer, like k8s does by default.

Environment

AWS

bcvanmeurs avatar Mar 14 '24 15:03 bcvanmeurs