graph-node icon indicating copy to clipboard operation
graph-node copied to clipboard

early false positive filter

Open mangas opened this issue 2 years ago • 6 comments

mangas avatar Sep 15 '22 13:09 mangas

Could the issue be better described, is this a correctness fix or an optimization? If it's correctness I'm surprised that match_and_decode would somehow be giving false positives.

leoyvens avatar Sep 15 '22 17:09 leoyvens

The issue is that if match_and_decode filters every trigger out, an "empty" POI is created because the main loop always assumes there is a valid set of triggers here

So this is both a fix (we don't process anything if there are no triggers) and an optimization because we avoid a lot of processing by moving this check to earlier point in the flow.

mangas avatar Sep 15 '22 17:09 mangas

Thanks for the explanation, please wait for a review from me before merging.

leoyvens avatar Sep 16 '22 09:09 leoyvens

On the correctness fix, the question is why is a PoI being written in blocks that processed no triggers. I believe this is due to this start_handler call: https://github.com/graphprotocol/graph-node/blob/4850b828f9257a7725758e5dc3d168b39305a790/core/src/subgraph/trigger_processor.rs#L37-L39 Which is called before match_and_decode so it may be called even if no handler is actually started. This was added in https://github.com/graphprotocol/graph-node/pull/2748, and I'm unsure of the rationale, but regardless we need to be compatible with it.

I believe the most minimal fix would be to call start_handler before processing the trigger on the first host that matches.

leoyvens avatar Sep 16 '22 12:09 leoyvens

The main issue is the assumption that any block then went through filtering will have at least a single valid trigger. There might be other places with this assumption, so IMV my fix is correct because it re-introduces that assumption. The only optimisation here is doing it earlier than the hosts and that is correct as well IMO because that's how the normal flow behaves. So having a minimal duplication (which is not actual code duplication) is a small price to pay.

Fixing the issue as you're proposing my expose other places where the assumption exist (and should exist) I have seen similar issues while working on the new substreams processing.

mangas avatar Sep 16 '22 12:09 mangas

The main issue is the assumption that any block then went through filtering will have at least a single valid trigger.

True this bug exposes this accidental assumption, but the intended design is for block stream filtering to allow false positives, even if all triggers in the block are false positives. So we should identify and fix all places where this incorrect assumption is being made.

leoyvens avatar Sep 16 '22 12:09 leoyvens