Migration to v1.3.0 can cause missed events when contract listeners are listening from latest
As part of the v1.3.0 release, we rearchitected the way event streams from the transaction managers are configured against the namespaces in FireFly. FireFly core will now create one event stream per namespace and as such we wrote code to migrate the existing contract listeners on the previous event stream to the corresponding namespace event stream.
Contract listeners have an option of where to listen from where the choices are accepted:
oldestwhich defaults to the first block- a specific block number
latestwhich will be start listen from the creation of the listener.
As such when the listeners get recreated as part of the migration, the ones set to latest will now listen from a new latest and in the time where the chain has advance and FireFly is starting up some event could get lost! This can get worse with another bug where if FireFly cannot communicate with the Transaction Manager to create the listener it will give up and never listen to those events!
A way to solve this would be:
- Retrieve the previous contract listeners created against the default eventstream
- Find the checkpoint value - i.e the block at which this listener has caught up to!
- When creating the new contract listener against the namespace listener set the option to listen from the checkpoint!
This was we do not lose events.
Looking at the code, this change is tricker than it sounds because the original change simply at startup creates an event stream and apply the listeners created for the namespace. There is no notion of "migration" so this code will need to introduce some code to get the existing listeners on the "global" event stream and hopefully be able to using some identifier correlate them to the new ones that need to be created.
Concrete proposal - which has other benefits:
- [x] For FFTM based connectors (EVMConnect) we do the work to ensure FireFly Core and query the latest checkpoint for a listener
- [ ] We update the migration code to ask the Go
blockchainplugin if it can get the checkpoint for any migration of listeners, and do one of these - in order:- ~Use the checkpoint if it can be obtained (EVMConnect only)~
- see below - unclear this is optimization is feasible, and does not add anything over the next item
- [ ] Look up the most recent event dispatched by the connector, and if one is found then require the
blockchainplugin to decode theprotocolIdback into afromBlock(all connectors - EthConnect/Fabric) - [ ] If no event, then fall back to the original approach
- ~Use the checkpoint if it can be obtained (EVMConnect only)~
The listener endpoint on FFTM already has the checkpoint information embedded, so it doesn't look like changes would be needed to the FFTM or EVMConnect codebases for Core to be able to query the checkpoint information:
https://github.com/hyperledger/firefly-transaction-manager/blob/1ac00b5eb37664d73abebd8822ae8439cef3f4bc/pkg/fftm/stream_management.go#L376-L385
This code confirms existence of custom listeners. Per the comment it's not specific to the migration scenario: https://github.com/hyperledger/firefly/blob/16aec0b55471c2950e2d95ff90972ca0455c369d/internal/contracts/manager.go#L156-L161
This calls down to ethereum specific code, which gets the subscription in the context of the e.streamID that is designated to the namespace:
https://github.com/hyperledger/firefly/blob/16aec0b55471c2950e2d95ff90972ca0455c369d/internal/blockchain/ethereum/ethereum.go#L909-L913
So from this I believe that the code for migration is rather leaning on simply the recreation code that was introduced (during the 1.2->1.3 development) for a different reason - the recreation of listeners in EVMConnect if they had been deleted.
Need to dig into what happens to the old listeners that were at a global level 👁️
Reading through https://github.com/hyperledger/firefly/pull/1388 and the code, it seems like the original event streams are just left and ignored. However, they are not referred to in any way currently. So implementing a read of their checkpoint would be complicated.
So I'm going to investigate locating/decoding/passing the last-event information.
For MultiParty networks (where used) the FireFly batch pin contract follows the same pattern, and there's even less code change as the namespace was included in the listener name already in 1.2: https://github.com/hyperledger/firefly/blob/16aec0b55471c2950e2d95ff90972ca0455c369d/internal/blockchain/ethereum/ethereum.go#L285-L289
The thing that distinguishes multi-party BatchPin events from all other events in the blockchainevents table, is the lack of a listener_id.
For Tokens, there is a strong leaning to using either 0 or a specific block number in the code. When a user doesn't specify a block number, it goes to 0 in the token connector before being passed to EVMConnect. There are many cases we see specific block numbers being used to designate the contract installation block for a token. However, currently I don't see any likelihood that token connectors would have been coerced with a blockNumber: "latest".
So I am not going to work on addressing that, as it would mean a change to the /activatepool flow (which I believe gets now called on every re-connect of a namespace to a token connector, against every pool) to look up the last event and provide it as extra information to handle only the latest special case.
I will look to discuss this a little further with @awrichar
@EnriqueL8 @awrichar - I have submitted PR https://github.com/hyperledger/firefly/pull/1534
Open conversations I'd like your view on this week:
- This only addresses the
newest/latestcase, it does not address the inefficiency of replaying all transactions since block0(or another specific block)- Should we extend the change to always use this logic if the
lastProtocolIDblock is greater-than the registration block?
- Should we extend the change to always use this logic if the
- Token connectors:
- Question outlined in previous comment
Thanks for all the thinking!
- I'm happy with not addressing this for the token connectors based on the above points.
- It would be create to solve the inefficiency as catching up from large chain as part of the upgrade can take a long time. Wouldn't it almost always be the case that the lastProtocolID block is greater-than the registration block?
So just thinking about the implications, it will simply be that the listener in EVMConnect/fabconnect will show a fromBlock that isn't the only you specified on the listener itself.
I think that's justifiable for the benefit you get - should roll up to the release notes for 1.3.1 for sure, and maybe we need a tweak to the reference section that talks about the event bus, to cover this.
Happy to take both those items on.
I've updated the PR to handle the 0 and explicit block cases for Fabric and Ethereum
Just added the documentation tag to make sure we document this new behaviour and add it to the release notes
So just thinking about the implications, it will simply be that the listener in EVMConnect/fabconnect will show a fromBlock that isn't the only you specified on the listener itself.
Agreed, we should just document that as part of upgrade your listeners will get recreated and as part of this it will take the new "latest" block so it does not loose events. We query the checkpoint information from the connector when retrieving a listener so users will see it straight away
This has been merged!