fix(audit): Remove Task data from aggregator after a response has been responded or expires.
This PR:
Removes Task data from aggregator response maps after a response has been responded to or expired.
closes #977
#To Test: Setup local dev net as per the guide
make anvil_start_with_block_time
make aggregator_start
make operator_register_and_start
make batcher_start_local
To confirm you can check validate the aggregator mappings for a specific task have been removed by adding:
agg.AggregatorConfig.BaseConfig.Logger.Info(
"Info deleted from Batcher:",
"Task Index", blsAggServiceResp.TaskIndex,
"Batch Identifier Hash", batchIdentifierHash,
"agg.batchesIdxByIdentifierHash:", agg.batchesIdxByIdentifierHash[batchIdentifierHash],
"agg.batchCreatedBlockByIdx:", agg.batchCreatedBlockByIdx[blsAggServiceResp.TaskIndex],
"agg.batchIdentifierHashByIdx:", agg.batchesIdentifierHashByIdx[blsAggServiceResp.TaskIndex],
"agg.batchDataByIdentifierHash:", agg.batchDataByIdentifierHash[batchIdentifierHash],
)
to line 324 in aggregator/internal/pkg/aggregator.go
and
agg.AggregatorConfig.BaseConfig.Logger.Info(
"Info deleted from Batcher:",
"Task Index", batchIndex,
"Batch Identifier Hash", batchIdentifierHash,
"agg.batchesIdxByIdentifierHash:", agg.batchesIdxByIdentifierHash[batchIdentifierHash],
"agg.batchCreatedBlockByIdx:", agg.batchCreatedBlockByIdx[batchIndex],
"agg.batchIdentifierHashByIdx:", agg.batchesIdentifierHashByIdx[batchIndex],
"agg.batchDataByIdentifierHash:", agg.batchDataByIdentifierHash[batchIdentifierHash],
)
to line 410 in aggregator/internal/pkg/aggregator.go.
I ran a local testnet with 3 operators produced. The aggregator did not experience any errors and and successfully removed added task information from the aggregator after successfully responding.
Changed this PR to draft since we should prioritize other issues first
What happens if an answer comes after the batch is closed? It should probably be tested with more operators
The aggregator when receiving the response checks that the batch indeed exists here. Anyway, I tested it on my machine doing:
- Registered four operators
- Killed one
- Sent a bunch of proofs
- Wait for the aggregator to respond to the task
- Woke the killed operator and when it got the latest task, it sent his response and the aggregator ignored it, logging the following message:
[2024-09-27 11:54:10.904 -03] INFO (pkg/server.go:63) - Locked Resources: Starting processing of Response
[2024-09-27 11:54:10.904 -03] INFO (pkg/server.go:67) - Unlocked Resources: Task not found in the internal map
Imo the deletion of the task in the maps should be delayed, so it will accept responses from operators after reaching the quorum
It can be done with a go routine like
go func() {
time.Sleep(10 * time.Second)
// Deletion logic
}()
The deletion is done once the task was verified in Ethereum (so there is a time between the quorum is reached and the transaction is accepted in Ethereum). So I'm not sure if we should add an extra delay in the deletion
The deletion is done once the task was verified in Ethereum (so there is a time between the quorum is reached and the transaction is accepted in Ethereum). So I'm not sure if we should add an extra delay in the deletion
That sounds to me like what we really want is a kind of queue (jargon for "an array") and a periodic task that checks if we're done with stuff.
EDIT: actually, I think it's simpler than that. We already have indices for each batch. Every once in a while (the appropriate time is probably around the time Ethereum takes to confirm our transactions) we check the oldest (lowest) index, find its Merkle root, and use that and the index to clean the dictionaries. This could be a long-lived goroutine started by main.