ditto Helm deployment throws 500 errors after AKS update

Helm deployment throws 500 errors after AKS update

Open dhcode opened this issue 8 months ago • 3 comments

We are using the helm chart v3.5.4 of eclipse/ditto for our deployment on an Azure Kubernetes cluster. Each ditto service we use (policies, things, thingsSearch, gateway) has 2 instances, and a pod disruption budget of 1, so no service is ever gone completely.

Almost every time there is an update of the AKS and the nodes get recreated one after another, ditto does not correctly answer the requests anymore and returns status 500.

In the logs we see errors like this:

Received DittoRuntimeException during enforcement or forwarding to target actor, telling sender: DittoInternalErrorException [message='There was a rare case of an unexpected internal error.', errorCode=internalerror, httpStatus=HttpStatus [code=500, category=SERVER_ERROR], description='Please contact the service team or your administrator.'

To fix it, we scale all deployments of the ditto services down and up again. Then it works again.

But I would expect that ditto heals itself when pods are removed and added.

Is there a setting to improve this behavior or do others have that issue, too?

Jun 04 '24 09:06 dhcode

ditto ditto copied to clipboard

Helm deployment throws 500 errors after AKS update

ditto
ditto copied to clipboard