convoy icon indicating copy to clipboard operation
convoy copied to clipboard

Event delivery retry stop working after some time

Open achiarenza opened this issue 1 year ago • 4 comments

I'm experiencing a strange behaviour when a delivery attempt fails.

Convoy correctly handle the retry mechanism (exponential backoff) that is setted up for the endpoint but it seems that the job, after some retry operation, stop to work and the last scheduled attempt is never picked up. The result is a retry event with "next attempt" date time that is in the past.

image

I'm currently using Convoy v23.06.1 but the same happened with the previous version.

achiarenza avatar Jun 16 '23 10:06 achiarenza

Hey @achiarenza 👋🏿

Hmm, this might be a bug with the exponential backoff. I'll take a look at it.

Can you please help me with the steps to reproduce this?

jirevwe avatar Jun 16 '23 10:06 jirevwe

Hello @jirevwe, for sure!

I'm currently using the docker compose file the repo provide to spin up Convoy, inside a Ubuntu 20.04.6 box.

Docker is version 23.0.5, build bc4487a.

Project settings are configured as you can see in the screenshot: image

All the other configuration values are left as default.

In my tries to have the issue fixed I tried to scale up the docker worker instance to a number grater than one with docker compose up --scale worker=2 -d but the problem persisted.

Let me know if you need some other info.

achiarenza avatar Jun 16 '23 10:06 achiarenza

Thanks for the info,

The exponential back-off strategy uses the values from table below which go from 10secs to 15mins. All subsequent retries after the 7th retry will be about 15 mins apart.

10000  // 10 seconds
30000  // 30 seconds
60000  // 1 minute
180000 // 3 minutes
300000 // 5 minutes
600000 // 10 minutes
900000 // 15 minutes

This might make a 20 retry limit strategy take about 3 hours to reach the failure state. Can you please share the worker logs, so I can debug further?

In the meantime, can you re-test it with a smaller retry limit (about 5 to 10) because I can't seem to reproduce this.

jirevwe avatar Jun 16 '23 10:06 jirevwe

The full docker log: worker.log with some info redacted.

achiarenza avatar Jun 16 '23 11:06 achiarenza