avro icon indicating copy to clipboard operation
avro copied to clipboard

avro: don't cache temporary errors

Open rogpeppe opened this issue 5 years ago • 2 comments

Currently if we get an error from the registry, we cache it for all time. We could instead inspect the error and cache the result only if it's not marked as temporary.

rogpeppe avatar Jan 24 '20 12:01 rogpeppe

We are using Avro with registry in several Go micro services in production. We have been hit by this bug several times, and the result is that we se a repeated error message in our log each time a POD tries to consume a message from a topic. The POD will never recover, since the error is cached forever. I've tried to fix this problem with this PR:

https://github.com/heetch/avro/pull/127

The idea here is to cache the error for one minute in order to keep the registry from being overloaded (which could be the cause of the problem in the first place), but to limit the cache to not being longer than one minute for errors relating to getting the schema. This will let the POD recover once the problem has been fixed (network issues, temporarily missing schema etc.).

I have kept eternal cache of schema decoding errors, since I can't think of any cases where they could be fixed without either upgrading the avro decoder or creating a new schema.

hennikul avatar Nov 30 '23 10:11 hennikul

We are currently running this PR in all our Avro Go services.

hennikul avatar Nov 30 '23 10:11 hennikul