data-prepper
data-prepper copied to clipboard
Enable ingestion priority in S3 scan
Is your feature request related to a problem? Please describe. I have a data pipeline built as a combination of AOSS pipeline and AOSS collection. This pipeline is a real time monitor for logs. We recently had an outage so the source did not move logs for few days. When we finally unblocked the pipeline and restarted the ingestion, all the days were moved at once and the AOSS pipeline started to ingest oldest to newest. This behavior does not work for us where we prioritize fresher data over older because we want a real-time monitor.
Describe the solution you'd like
The request is for implementing an alternative behavior controlled by a setting (f.i. order:(newer_first|older_first))
where user can control the order of the ingestion. In particular, it should be
older_first
: (FIFO) older records are ingested first. Any new record added to the ingestion queue does not change the order (current behavior)
newer_first
: (LIFO) newer records are added to the top of the ingestion queue and comes first changing the order of the ingestion.
Describe alternatives you've considered (Optional) I have no alternatives for now.
Additional context I think this feature should also be coupled with another feature which is to discard data that are older than XX if still to be ingested. That would also alleviate the problem above.