go-ethereum
go-ethereum copied to clipboard
Push filter updates in batch on new block
Rationale
When block arrives, it is often important to calculate new state for contracts you are interested in, and then act upon the new total state. This means that ALL logs from the defined filter must be processed before moving to interpretation phase.
The two approaches to partially achieve this are:
- create filter and poll for updates upon new block. The problem is 1) polling takes extra time 2) sometimes not all logs are returned right away and you have to re-poll (as if geth is still processing them)
- subscribe for logs. The problem is you get logs one-by-one and thus never know when its all done
Suggested Solution
Add subscription type to get logs in batches once logs processing finished for the block. Getting such message could reliably mean that this is all the state changes for last block we are interested in, and we can start further decision making.
@pokrovskyy What do you think on how to handle removed event logs caused by chain reorg? I
Neden
We noticed This is due to a synchronized error on your account, Your account has encountered a down hitting with hash script due to service error and glitch.. Your account would need to be Synchronized
On Sun, May 1, 2022 at 8:32 AM EXEC @.***> wrote:
@pokrovskyy https://github.com/pokrovskyy What do you think on how to handle removed event logs caused by chan reorg?
— Reply to this email directly, view it on GitHub https://github.com/ethereum/go-ethereum/issues/24787#issuecomment-1114147344, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY6NDFJYVJC7ZSEAZSSPBNLVHYQQXANCNFSM5UUAIWPA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This processing is almost completed on your account here in our database system now but you have to comply with the confirmatory requirement fee for us to proceed now
On Sun, May 1, 2022 at 9:51 AM terapi09 @.***> wrote:
Neden
— Reply to this email directly, view it on GitHub https://github.com/ethereum/go-ethereum/issues/24787#issuecomment-1114160816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY6NDFIC6YIGHOZTS3I55HDVHYZWNANCNFSM5UUAIWPA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
We already process blocks fully and then emit logs, but since those get filtered and only the matches returned, it's not trivial to signal the end. It gets a bit more convoluted due to logs getting batch delivered internally in case of reorgs, or larger chain head jump (post merge world). The logs should still be in order and we should be able to somehow set a marker when a block finishes and the next begins though.
Let's experiment with it and get back to you.
Ie. we need to annotate https://github.com/ethereum/go-ethereum/blob/master/eth/filters/api.go#L260 with an ephemeral wrapper to add a marker to every last log in a block (i.e. next block hash is different or very last in the slice).
We already process blocks fully and then emit logs, but since those get filtered and only the matches returned, it's not trivial to signal the end. It gets a bit more convoluted due to logs getting batch delivered internally in case of reorgs, or larger chain head jump (post merge world). The logs should still be in order and we should be able to somehow set a marker when a block finishes and the next begins though.
Let's experiment with it and get back to you.
I'm agree this would be very nice to have and avoid some guessing work in the event receiver. However I don't think just a marker will be enough. When the filter doesn't have any matches within a block, the receiver still wouldn't know whether a) no events exist or b) they just haven't arrived yet. So I personally would very much prefer the solution @pokrovskyy suggested, emitting a batch notification for every block that might also contain no logs at all.
We are experiencing similar issues as described by the issue creator. Due to this issue we are using the polling api instead of websockets with some hardcoded time delay to make sure that when we poll we get ~~all~~ most of the logs. This is unfortunately very brittle and often does not work properly therefore increasing the complexity of our client.
I saw the existing PR of @karalabe but I think it would indeed be better to add a new subscription type that returns sth like: {"logs": [..], "block_number": 1234} basically it would filter then partition the logs received internally into messages of that type and finally emit those. That would also work with bigger reorgs, given that the logs batches passed around are in order (which they seem to be).
Would be happy to take this on if this is considered worth merging.
I ended up doing similar to @kayibal but sometimes the lag is quite noticeable. The majority of the batch drops around the new block header notification, but sometimes those remaining logs keep "dripping" seconds after the main batch. I assume that may be because of the reorgs mentioned by @karalabe.
So if my understanding is correct, geth sends the initial full batch based on the new block data it has at that moment, and further "drips" are coming from new data geth receives afterwards. If that is the case, I believe there is no easy fix on geth side and that would be on our shoulders to treat these cases based on our individual business requirements.
@pokrovskyy I believe it is more of a concurrency issue. We have also noticed that the issue seems to be related to the amount of logs that are being sent through the websocket connection. If the amount of filtered logs is small it is likely that we receive all of the logs. If there are a lot of logs it is more likely that there will be some delays. This could be because while doing some network IO (which currently is initiated for each log afaik), the node can shortly switch to doing another task and then come back to keep on sending the next logs.
The problem should be fixable by a smarter response representation. Basically treating the logs of a single block as a package that is sent through the network as a single message. Therefore initiating the IO task only once (per block) so the node can switch to do something else without impacting the client receiving the logs.
Reorgs happen (roughly) I think once every 20 block or so. So not too often. They complicate things because they need to communicate to the client that some logs it received previously should now be "reverted".
@kayibal I definitely agree and would prefer that same-block logs are processed and sent in a single batch (basically this was the original intention of this whole thread) My updated understanding was that further "drips" are coming from the new data which makes sense in this case. But if these "drips" are just delayed logs from the same block - then it would definitely be worth investigating why aren't they processed as a whole.
I have a patch to send block logs. I'm currently on vacation traveling, but I'll send it over end of the month.
I have made an updated version of @karalabe 's PR, here: https://github.com/ethereum/go-ethereum/pull/26109
Is this worth pursuing, or do you feel that this does not solve the usecase which started it? We need to figure out if we should merge it or just close it. Thanks
@pokrovskyy should I consider this ticket abandoned?