ditto
ditto copied to clipboard
Helm deployment throws 500 errors after AKS update
We are using the helm chart v3.5.4 of eclipse/ditto for our deployment on an Azure Kubernetes cluster. Each ditto service we use (policies, things, thingsSearch, gateway) has 2 instances, and a pod disruption budget of 1, so no service is ever gone completely.
Almost every time there is an update of the AKS and the nodes get recreated one after another, ditto does not correctly answer the requests anymore and returns status 500.
In the logs we see errors like this:
Received DittoRuntimeException during enforcement or forwarding to target actor, telling sender: DittoInternalErrorException [message='There was a rare case of an unexpected internal error.', errorCode=internalerror, httpStatus=HttpStatus [code=500, category=SERVER_ERROR], description='Please contact the service team or your administrator.'
To fix it, we scale all deployments of the ditto services down and up again. Then it works again.
But I would expect that ditto heals itself when pods are removed and added.
Is there a setting to improve this behavior or do others have that issue, too?