onyxia icon indicating copy to clipboard operation
onyxia copied to clipboard

No error message when exceding quota

Open micedre opened this issue 1 year ago • 8 comments

On our platform, with the following quota rule in place :

"default": { 
       "requests.storage": "50Gi", 
       "count/pods": "5",
       "requests.nvidia.com/gpu": 0,
       "requests.cpu": 20,
       "requests.memory": "50Gi"
},

When exceeding the quota, there is no error message, the service seems alive but clicking on the link gives a 503 error.

For example, launching 6 jupyther-python services, there is no feed backs in onyxia that the quota is exceeded image

With kubectl we can see that the statefulset is not ready : image

Launching a service exceeding quota should display an informative message to the user.

micedre avatar Dec 13 '23 09:12 micedre

Hi !

I agree on this but I'm not sure what the best way of implementing this is.
Can you check if there are kubernetes events (kubectl get events / kubectl events) that could help us on this ?
In a broader scope, I think onyxia should display current quotas (and usage) to the user. The Onyxia API already exposes those values.

olevitt avatar Dec 13 '23 09:12 olevitt

There is this one :

54s         Warning   FailedCreate             statefulset/jupyter-pyspark-345764                create Pod jupyter-pyspark-345764-0 in StatefulSet jupyter-pyspark-345764 failed error: pods "jupyter-pyspark-345764-0" is forbidden: exceeded quota: onyxia-quota, requested: count/pods=1, used: count/pods=5, limited: count/pods=5

Don't know if you can access it

micedre avatar Dec 13 '23 09:12 micedre

In a broader scope, I think onyxia should display current quotas (and usage) to the user. The Onyxia API already exposes those values.

Yes !

micedre avatar Dec 13 '23 09:12 micedre

For reference, GET /api/my-lab/quota returns both the current quotas and usage :
e.g : {"spec":{"requests.storage":"2Ti","count/pods":100},"usage":{"requests.storage":"47Gi","count/pods":4}} I will create an issue for displaying that in the UI

olevitt avatar Dec 13 '23 09:12 olevitt

Now that there are quotas in the UI, we know there could be a check on the api to react if it's possible to install a new chart or not

odysseu avatar Mar 28 '24 08:03 odysseu

Now that there are quotas in the UI, we know there could be a check on the api to react if it's possible to install a new chart or not

I don't think it would be that easy to check if a helm install would exceed any quotas as most quotas are related to objects (mainly pods) created After thé install is done. There may bé a way to implement a logic that would analyse the objects (statefulset, deployments ...) from the helm manifest but that would not be easy nor error-proof so may not be desirable to implement. Am I missing something ?

olevitt avatar Mar 28 '24 09:03 olevitt

Our plan is to install the helm chart et get the events for exceeded quota.

fcomte avatar Mar 28 '24 09:03 fcomte

I'm actively working on this

garronej avatar Mar 28 '24 10:03 garronej