neo-go icon indicating copy to clipboard operation
neo-go copied to clipboard

`upload-bin` hangs on index files construction if at least one SEARCH result is incomplete

Open AnnaShaleva opened this issue 1 year ago • 5 comments

Current Behavior

Here are logs of upload-bin command for neogo-block-uploader-fschain-testnet.service instance:

2024-10-28 06:32:48.810	Successfully uploaded batch of blocks: from 4440000 to 4449999	
2024-10-28 06:41:36.505	Processing batch from 4460000 to 4461912	
2024-10-28 06:41:36.505	Successfully uploaded batch of blocks: from 4450000 to 4459999	
2024-10-28 06:43:28.713	Updating index files...	
2024-10-28 06:43:28.713	Successfully uploaded batch of blocks: from 4460000 to 4461912

Index file update started at 06:43:28.713 and already 3 hours passed since its start. No index files are uploaded yet, and it's a bug in the program.

The problem is in the waiting code:

https://github.com/nspcc-dev/neo-go/blob/b8a65d3c37cfa429b8a5e135422bb1fa78762056/cli/util/uploader.go#L416-L428

This code waits for exactly indexFileSize number of OIDs to be returned from SEARCH routines. However, due to #3645 a incomplete batch may be returned, and we don't have any check for this in the code: https://github.com/nspcc-dev/neo-go/blob/b8a65d3c37cfa429b8a5e135422bb1fa78762056/cli/util/uploader.go#L396-L400 https://github.com/nspcc-dev/neo-go/blob/b8a65d3c37cfa429b8a5e135422bb1fa78762056/cli/util/uploader.go#L410-L412

Expected Behavior

Index files must be uploaded without deadlocks and endless cycles.

Possible Solution

Add retry to SEARCH if the number of returned objects is less than expected indexFileSize. Retry until successful SEARCH attempt.

AnnaShaleva avatar Oct 28 '24 09:10 AnnaShaleva

Here are some more recent logs, we need this issue to be fixed:

2024-10-29 04:04:40.526	Successfully uploaded batch of blocks: from 6590000 to 6597865	
2024-10-29 04:33:47.787	failed to update index files after upload: empty oid found in index file 0 at position 16556 (block index 16556)	
2024-10-29 04:34:51.317	Chain block height: 6598013	
2024-10-29 04:38:49.004	Processing batch from 6590000 to 6598013	
2024-10-29 04:38:49.004	First block of latest incomplete batch uploaded to NeoFS container: 6590000	
2024-10-29 04:42:29.264	Updating index files...	
2024-10-29 04:42:29.264	Successfully uploaded batch of blocks: from 6590000 to 6598013	
2024-10-29 05:09:51.357	failed to update index files after upload: empty oid found in index file 0 at position 1976 (block index 1976)	
2024-10-29 05:10:52.966	Chain block height: 6598150	
2024-10-29 05:13:47.332	Processing batch from 6590000 to 6598150	
2024-10-29 05:13:47.332	First block of latest incomplete batch uploaded to NeoFS container: 6590000	
2024-10-29 05:17:27.529	Updating index files...	
2024-10-29 05:17:27.529	Successfully uploaded batch of blocks: from 6590000 to 6598150	
2024-10-29 05:45:49.080	failed to update index files after upload: empty oid found in index file 0 at position 4257 (block index 4257)	
2024-10-29 05:46:51.451	Chain block height: 6598287
2024-10-29 05:51:47.574	Processing batch from 6590000 to 6598287	
2024-10-29 05:51:47.574	First block of latest incomplete batch uploaded to NeoFS container: 6590000	
2024-10-29 05:56:13.871	Updating index files...	
2024-10-29 05:56:13.871	Successfully uploaded batch of blocks: from 6590000 to 6598287	
2024-10-29 06:28:38.586	failed to update index files after upload: empty oid found in index file 0 at position 91 (block index 91)

AnnaShaleva avatar Oct 29 '24 07:10 AnnaShaleva

BTW, these gaps in the resulting index file (empty oid found in index file 0) may still be connected not only with incomplete SEARCH result, but also with some duplicating blocks, similar to https://github.com/nspcc-dev/neo-go/issues/3630#issuecomment-2426476212. @AliceInHunterland, please also check if the uploaded sequence of blocks contains duplicating entries for all chains, at least within the bounds of the first index file.

AnnaShaleva avatar Oct 29 '24 07:10 AnnaShaleva

BTW, all these logs about "gaps" belong to a single neogo-block-uploader-fschain-mainnet.service instance, and the numbers 91, 1976, 16556, 4257 are repeated, here's a full log tail: Explore-logs-2024-10-29 08_26_04.txt

AnnaShaleva avatar Oct 29 '24 08:10 AnnaShaleva

To summarize, the reason is that there are duplicating block objects uploaded to the container (@AliceInHunterland you may attach the example). NeoFS session token and block signatures set affect the resulting object ID, hence within the scope of this issue we need to prepare index file constructor to the fact that there might be duplicating entries in the container.

AnnaShaleva avatar Oct 29 '24 13:10 AnnaShaleva

./bin/neo-go util upload-bin --cid VCVr6RKg1hfJR7v36xvjbaXTncj1waPi94u9EFyEttW --wallet-config ./wallet-config.yml --block-attribute Block --index-attribute Index --rpc-endpoint https://rpc1.morph.fs.neo.org:40341 --fs-rpc-endpoint st1.storage.fs.neo.org:8080 --timeout 3m --skip-blocks-uploading --index-file-size 10000

Warning: Missing block index 1889
Warning: Missing block index 1976
Warning: Multiple oids: [4vMdkXbQEgnSSiEi7xsywwoqpbdS6mgqPFSSouyqXAEj 5uiRgBiVmcPrRPjjvJ7tqj3q6pbEb5tP4yynMMbmHHXf] for block index 2171
Warning: Multiple oids: [HvAc5MmjmQk94uJq6yCC9uxKLd87D8iGHJqQb3dKhw99 32RL6F6zf7r7ueShgPFi4SHDAWjF1U7p4PdFKsfdR6Qk] for block index 3714
Warning: Multiple oids: [5YGcGQ3XQKKxDYDTc7vjxkpSAxZaFFreeGSEpcqG4aWF 5qqheyM2Uoisjfuof4P1JpkzAR7vh7ih4uQaw97cYUDN] for block index 4001
Warning: Missing block index 4257

AliceInHunterland avatar Oct 29 '24 13:10 AliceInHunterland