Vector sources acknowledge discarded events
When end-to-end acknowledgments are enabled, Vector sources (where possible) are configured to wait until events are successful processed by associated sinks before acknowledging to the client connected to the source.
However, if an event is discarded by the configured topology, the source acknowledges this event as successfully processed. This was the intended implementation, but it is becoming clearer that this is unexpected behavior. As a Vector user, I would not expect these events to be marked as acknowledged.
Vector can discard events early when:
- A transform errors and drops the event (like
remapwhendrop_on_err = true) - A component output is not used as an input to another component (e.g. an unconsumed
route)
Instead, these should result in negative acknowledgements to the source.
Prompted by https://discord.com/channels/742820443487993987/746070591097798688/964167175923433522
Related: https://github.com/vectordotdev/vector/issues/12109
We could fail to boot a Vector config that has an unconsumed output for a source with acks enabled.
Another idea courtesy of @binarylogic: we force all possible component errors to be handled explicitly when ack's are enabled.
Expanding that some, I think it could look like: all components that can drop events would have a on_error: (reject|discard|route) config option that would let users configure the acking behavior when an error occurs. If:
rejectit would reject the event and cause a negative ack in the sourcediscardit would be discarded and ack'drouteit would be routed to aerroroutput
cc/ @hhromic for any thoughts since your discord conversation prompted this discussion 🙂
rejectit would reject the event and cause a negative ack in the source
What's a negative ack?
rejectit would reject the event and cause a negative ack in the sourceWhat's a negative ack?
Ah, for HTTP, this would be something like returning a 400.
For sources like kafka there is no "negative ack" so it just wouldn't ack.
An alternative UX could be to have the blackhole sink have an option for rejecting data (or a new sink) that would allow us just to always have an error named output that the user would be responsible for routing to a blackhole sink to discard or reject when end-to-end acks are enabled.
cc/ @hhromic for any thoughts since your discord conversation prompted this discussion 🙂
Hello all, first of all, thanks for considering the opinion of users. It is highly appreciated :) Not every software package we use can be praised with the same :P
While checking the documentation again, I stumbled upon the actual place where acknowledgement is properly defined:
For a source that supports end-to-end acknowledgements and is connected to a sink that has the acknowledgements option enabled, this will cause it to wait for all connected sinks to either mark the event as delivered, or to persist the events to a disk buffer, if configured on the sink, before acknowledging receipt of the event. If a sink signals the event was rejected and the source can provide an error response (i.e. through an HTTP error code), then the source will provide an appropriate error to the sending agent.
More importantly, it also clearly states what happens with dropped events:
Some transforms will drop events as part of their normal operation. For example, the dedupe transform will drop events that duplicate another recent event in order to reduce data volume. When this happens, the event will be considered as having been delivered with no error.
Therefore, in terms of documentation/behaviour, Vector seems to just be operating as designed :). I wouldn't classify this issue as a bug but an enhancement or feature request instead.
I was previously reading this other documentation page, which seems not so complete as above, and thus raising my questions in Discord: https://vector.dev/docs/about/under-the-hood/architecture/end-to-end-acknowledgements/
That being said, I think good enhancements as proposed are:
- https://github.com/vectordotdev/vector/issues/12217#issuecomment-1099544483 : This makes a lot of sense because enabling acks should also imply the user wanting more strict/controlled topologies. On the other hand, it might become annoying for users that know what they are doing. Especially when debugging.
- https://github.com/vectordotdev/vector/issues/12217#issuecomment-1099551164 : I feel this suggestion can devolve into "configuration options creep". For the
remaptransform we already havereroute_dropped,drop_on_abortanddrop_on_error. So I would instead simplify on managing the behaviour using these three. For theroutetransform we also now have the_unmatchedoutput that can be routed. See next. - https://github.com/vectordotdev/vector/issues/12217#issuecomment-1099563743 : I also think this is a good idea. The user would then need to explicitly define a sink that "rejects messages for ack", i.e. a "rejector" sink where the user can redirect all messages that should be rejected upon desired conditions. This can couple very well with the existing
remapandrouteoptions (see above). More importantly, this would force the user to make more explicit/clear topologies.
So in summary, I think the way to go would be to make sure that transforms that can drop messages to implement a dropped output like the remap transform and then provide a sink that can be used to wire up these outputs. The user then can have full control of what to reject and what to discard depending on how the topology is constructed.
Thank you for your valuable input.
Therefore, in terms of documentation/behaviour, Vector seems to just be operating as designed :). I wouldn't classify this issue as a bug but an enhancement or feature request instead.
For clarification, there are two classes of discarded events: those that are discarded as a normal part of the transform's behavior, like described for dedupe or filter or the like, and those that are discarded as a result of an error. It is the latter that are primarily at issue here while the documentation describes the former.
@bruceg ah that's a good clarification indeed.
Then it looks like we should better define dropped as normal transform operation.
For example, in the route transform, events sent to the _unmatched output are dropped as normal?
Same in the remap transform, events sent to the *.dropped output (for example via abort) are dropped as normal?
In the case of remap is tricky because the programmer currently has no way to express a dropped as normal intention.
in the
routetransform, events sent to the_unmatchedoutput are dropped as normal? Same in theremaptransform, events sent to the*.droppedoutput (for example viaabort) are dropped as normal?
I think these are where one of @jszwedko's suggestions would come into play, where we would require all outputs to be consumed by another component at a configuration level, in which case they are not dropped at all.
Yes, indeed. Especially the suggestion about adding configurable ack behaviour to the blackhole sink.
In this way the user can explicitly configure how the acks should be done via topology construction.
Just updating the status. I realize the lines are blurry here, but I think this is a bug given normal user expectations for this feature.
I caught this error and the consuming from Kafka totally stopped:
2022-08-26T03:18:28.849503Z ERROR source{component_kind="source" component_id=kafka_in_be component_type=kafka component_name=kafka_in_be}: vector::internal_events::kafka: Event received a negative acknowledgment, topic has been stopped. error_code="negative_acknowledgement" error_type="acknowledgment_failed" stage="sending" topic="log.java_standard_log" partition=1 offset=12277332826
I build from newest master version these days, does it related to this bug?
Hi @honganan ! What you see there is a change to have Vector avoid acknowledging the event in Kafka if it failed to successfully process it. This change was made to avoid data loss by letting operators intervene to fix the issue that caused the failure.
Hi @honganan ! What you see there is a change to have Vector avoid acknowledging the event in Kafka if it failed to successfully process it. This change was made to avoid data loss by letting operators intervene to fix the issue that caused the failure.
After reconsidering this behavior in light of a newly codified UX principle some planned work on error handling, we've decided to revert it for v0.24.0.
Hi @honganan ! What you see there is a change to have Vector avoid acknowledging the event in Kafka if it failed to successfully process it. This change was made to avoid data loss by letting operators intervene to fix the issue that caused the failure.
After reconsidering this behavior in light of a newly codified UX principle some planned work on error handling, we've decided to revert it for v0.24.0.
That would be good for UX.
weird workaround i found for this, use reroute_dropped: true on the remap that drops events, then create an invalid sink that rejects everything, such as:
error:
type: file
path: "" # intentionally invalid
encoding:
codec: json
acknowledgements:
enabled: true
inputs:
- events_parsed.dropped
this will reject all acks (with 500 internal server error for http source)