onyxia
onyxia copied to clipboard
No error message when exceding quota
On our platform, with the following quota rule in place :
"default": {
"requests.storage": "50Gi",
"count/pods": "5",
"requests.nvidia.com/gpu": 0,
"requests.cpu": 20,
"requests.memory": "50Gi"
},
When exceeding the quota, there is no error message, the service seems alive but clicking on the link gives a 503 error.
For example, launching 6 jupyther-python services, there is no feed backs in onyxia that the quota is exceeded
With kubectl we can see that the statefulset is not ready :
Launching a service exceeding quota should display an informative message to the user.
Hi !
I agree on this but I'm not sure what the best way of implementing this is.
Can you check if there are kubernetes events (kubectl get events / kubectl events) that could help us on this ?
In a broader scope, I think onyxia should display current quotas (and usage) to the user. The Onyxia API already exposes those values.
There is this one :
54s Warning FailedCreate statefulset/jupyter-pyspark-345764 create Pod jupyter-pyspark-345764-0 in StatefulSet jupyter-pyspark-345764 failed error: pods "jupyter-pyspark-345764-0" is forbidden: exceeded quota: onyxia-quota, requested: count/pods=1, used: count/pods=5, limited: count/pods=5
Don't know if you can access it
In a broader scope, I think onyxia should display current quotas (and usage) to the user. The Onyxia API already exposes those values.
Yes !
For reference, GET /api/my-lab/quota returns both the current quotas and usage :
e.g : {"spec":{"requests.storage":"2Ti","count/pods":100},"usage":{"requests.storage":"47Gi","count/pods":4}}
I will create an issue for displaying that in the UI
Now that there are quotas in the UI, we know there could be a check on the api to react if it's possible to install a new chart or not
Now that there are quotas in the UI, we know there could be a check on the api to react if it's possible to install a new chart or not
I don't think it would be that easy to check if a helm install would exceed any quotas as most quotas are related to objects (mainly pods) created After thé install is done. There may bé a way to implement a logic that would analyse the objects (statefulset, deployments ...) from the helm manifest but that would not be easy nor error-proof so may not be desirable to implement. Am I missing something ?
Our plan is to install the helm chart et get the events for exceeded quota.
I'm actively working on this