jitsu
jitsu copied to clipboard
Improve and clarify error retry logic
Problem
- Currently Jitsu retries error infinitely - that doesn't make much sense because many kind of errors cannot be solved with retries.
- For streaming storages error leads to growth of redis queue
- Retry and fallback logic is not clear and not documented
Solution
- Introduce
server.error_retry_period_hours
configuration parameter that will work as default for all destinations (streaming and batch). Default value: 24 hours - Introduce DestinationConfig
error_retry_period_hours
parameter that will override default value on destination level.
uploader.go unify fallback logic:
- all errors (parsingErrors, failedEvents, resultPerTable.result.Err) must go to Fallback only after
error_retry_period_hours
passes. (seems that currently for parsingErrors and failedEvents we flood fallback logs with copies of the same events on each uploader run) - after
error_retry_period_hours
passes jitsu needs to archive incoming file and cleanup status
streaming.go:
- don't use
IsConnectionError
check – retry all errors - instead of 20 sec hardcode introduce
server.streaming_retry_delay_minutes
parameter. Default: 1 - after
server.error_retry_period_hours
passes - stop retries and Fallback error events. - Current fallback logic is hidden in abstract.go
AccountResult
and must be removed from there.
documentation:
- write Error Handling and Retries documentation page that describes that logic and configuration parameters
No changes in uploader.go yet
Current fallback logic is hidden in abstract.go AccountResult and must be removed from there.
Not addressed yet.