chainlink icon indicating copy to clipboard operation
chainlink copied to clipboard

Replace single course grained filter with batch request of fine grained topic filters

Open reductionista opened this issue 1 year ago • 5 comments

This PR changes the way in which LogPoller requests logs from the rpc server. Instead of maintaining one giant cached filter which includes a list of all contract addresses and topics, and passing that to ec.GetFilter() each time logs are polled, it maintains an index of eth_getLogs requests containing more fine grained filters, sending these in a call to ec.BatchCallContext() all at once, and combining the results. The rpc server is still only contacted once, but the amount of data which comes back will be much more tailored to what we actually need, reducing in some cases significantly reducing the number of results coming back, the amount of network traffic, and the amount of storage space we use in the db (although the last one could have been accomplished in simpler ways). This is especially important for the new Automation LogTriggers feature, where each custom user filter must have separate rate limiting and accounting enforced.

As an example, with the "single giant filter" approach, if we had 100 filters each with 1 address and 1 event type, we would be requesting 100 x 100 = 10,000 combinations of addresses + events, while with the new approach we will only request the 200 cases we actually need. Most of those would be unlikely to show up, but when considering intentional DoS attacks on LogTriggers one could imagine a scenario where these 100 x 100 combinations generate an enormous amount of results for each node to save in its db.

Additionally, it allows filtering on all 5 keys (address, event_sig, topic2, topic3, topic4) supported by evm logs, where previously we only filtered on (address, event_sig). So for example, if there is a custom filter registered which only needs to track token transfers between a particular from and to address, the filter can use topic2=from, topic3=to and this will narrow down the results we get back from "all transfers to and from all wallets in the world" to just a single pair of wallets.

The brute force approach to this would be for each req we send to correspond to a single registered filter. This would work well for most purposes, but there are cases where reqs for 2 or more filters can be merged together into a single request. We have to be careful not to over-optimize here, because doing too much merging can re-open the door to an explosion of combinations. So we take a cautious approach of only optimizing by merging when it's clear there will be no additional cost in bandwidth. There are only two cases where we merge reqs for similar filters together: 1. if they share exactly the same event sig and topics list, then we can merge the lists of contract addresses together. and 2. if there are two filters for the same contract address where one of them is narrower than the other (matches a subset or the same set of event_sig + topic combinations), then the broader of the two filters is all we need.

The lp.Filter() method has been replaced with lp.ethGetLogsReqs(). batchFetchLogs, sendBatchRequests, and a handful of other helper methods have also been added for support.

It also replaces these fields in logPoller struct:

lp.filter
lp.filterDirty
lp.cachedReqsByAddress
lp.cachedReqsByEventTopicsKey

With new fields, used for implementing caching and indexing of a collection of batch requests, insetad of a single filter:

lp.newFilters                  map[string]struct{}                    // Set of filter names which have been added since cached reqs indices were last rebuilt
lp.removedFilters              []Filter                               // Slice of filters which have been removed or replaced since cached reqs indices were last rebuilt
lp.cachedReqsByAddress         map[common.Address][]*GetLogsBatchElem // Index of cached GetLogs requests, by contract address
lp.cachedReqsByEventsTopicsKey map[string]*GetLogsBatchElem           // Index of cached GetLogs requests, by eventTopicsKey

Some convenience methods for parsing fields in a rpc.BatchElem containing an eth_getLogs request have been added in the form of a derived type GetLogsBatchElem

reductionista avatar Mar 08 '24 22:03 reductionista

I see that you haven't updated any README files. Would it make sense to do so?

github-actions[bot] avatar Mar 08 '24 22:03 github-actions[bot]

I see you added a changeset file but it does not contain a tag. Please edit the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

github-actions[bot] avatar Apr 25 '24 02:04 github-actions[bot]

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jun 25 '24 00:06 github-actions[bot]