dkg-engine Node responds to the challenge using wrong knowledge collection

Issue description

It looks like sometimes when my node responds to the challenge it uses incorrect knowledge collection id, hence fails the challenge. I'm not very familiar with the random sampling process, please check my logic here.

In node logs I can see this. I read it as - node successfully requested a challenge for the proofing period which starts at block 9834600. And as a response it was told to provide proof for kc_id = 2840829 and chunk_id = 1169

[2025-06-27 14:44:31] DEBUG: getActiveProofPeriodStatus() call has been successfully executed; Result: 9834600,true; RPC: https://astrosat.origintrail.network/. [2025-06-27 14:44:31] DEBUG: [PROOFING] Checking proof period validity: isValid=true, activeProofPeriodStartBlock=9834600, latestChallengeBlock=9834300, sentSuccessfully=true, blockchainId=otp:2043 [2025-06-27 14:44:31] INFO: [PROOFING] Preparing new proof for blockchain: otp:2043 [2025-06-27 14:44:31] DEBUG: [PROOFING] Starting proof preparation for blockchain: otp:2043, challengeId: 103 [2025-06-27 14:44:31] INFO: Calling createChallenge with priority: 5 [2025-06-27 14:44:32] DEBUG: Sending signed transaction createChallenge to the blockchain otp:2043 with gas limit: 308178 and gasPrice 8. Transaction queue length: 0. Wallet used: 0xe2cae458b713721cFcBD5481B7c791180E38070C [2025-06-27 14:44:33] DEBUG: createChallenge() estimateGas has been successfully executed; RPC: https://astrosat-parachain-rpc.origin-trail.network. [2025-06-27 14:44:37] DEBUG: getNodeChallenge(identityId=77) call has been successfully executed; Result: 2840829,1169,0x8f678eB0E57ee8A109B295710E23076fA3a443fe,6,9834600,300,false; RPC: https://astrosat.origintrail.network/. [2025-06-27 14:44:37] DEBUG: [PROOFING] New challenge created: challengeId=104, epoch=6, contractAddress=0x8f678eb0e57ee8a109b295710e23076fa3a443fe, knowledgeCollectionId=2840829

Then it fails to proof - fails_to_proof.txt I removed a fair amount of data from the log as it's unreadable. I do not understand that part very well.

But what bothers me here is this. When I look onto randomSamplingStorage nodeChallengeSet event I see this:

There is indeed a challenge set for my node for proofing period which starts at 9834600. But the kc_id is different - 3625668.

So it appears that my node tries to solve a challenge for a wrong kc_id.

The vast majority of challenges my node successfully solves, as I can see nodeChallengeSet events with [solved] value set to 1.

Also, staking page shows my node (Amado1) health as 100, which is surprising as it failed a few challenges.

Jun 28 '25 00:06 botnumberseven

Thanks for the detailed submission @botnumberseven, looks like a bug, labelling so

Jun 29 '25 08:06 branarakic

I can see a couple of challenges for my node which are in the same bucket

active_proof_period_start_block 9862500, in the random sampling storage contract I can see kc_id 1551438 in the nodeChallengeSet event, while on the node I can see this [2025-06-29 14:06:50] DEBUG: getNodeChallenge(identityId=77) call has been successfully executed; Result: 1251896,259,0x8f678eB0E57ee8A109B295710E23076fA3a443fe,6,9862500,300,false; RPC: https://astrosat-2.origintrail.network/.

active_proof_period_start_block 9861300, in the random sampling storage contract I can see kc_id 2311397 in the nodeChallengeSet event, while on the node I can see this [2025-06-29 12:01:34] DEBUG: getNodeChallenge(identityId=77) call has been successfully executed; Result: 873822,838,0x8f678eB0E57ee8A109B295710E23076fA3a443fe,6,9861300,300,false; RPC: https://astrosat-2.origintrail.network/.

Jun 29 '25 17:06 botnumberseven

Hey, thanks for reporting @botnumberseven.

We understand the issue, fix is being prepared.

Jun 30 '25 13:06 Mihajlo-Pavlovic

Just to check you are only seeing this on NeuroWeb?

Jun 30 '25 13:06 Mihajlo-Pavlovic

@Mihajlo-Pavlovic My node runs only on Neuroweb, so I do not know if it's applicable to Gnosis or Base.

Jun 30 '25 13:06 botnumberseven

This should be resolved with latest mainnet release is it still active?

Aug 08 '25 13:08 Mihajlo-Pavlovic

Hmm... I'm not exactly sure. On my node I monitor content of a random_sampling table in operationaldb (just the top row related to the current active proof block), and compare values to getNodeChallenge from random sampling storage contract. In case they are different I update operationaldb values according to the contract. Today I've seen a case of a mismatch for active proof block 10430100 (token_id | chunk): contract - 148417 | 1614 local db - 1900104 | 25

If the expectation that operationaldb data must always match contract, then the issue is still present. If the expectation that operationaldb can mismatch to contract data, but fixed later within the same active proof block... then I'm not sure, as I might fix it before the node itself has a chance to fix it.

Aug 08 '25 13:08 botnumberseven