neo-go icon indicating copy to clipboard operation
neo-go copied to clipboard

Adopt possibly incomplete NeoFS SEARCH results in NeoFSBlockFetcher and `upload-bin` CLI command

Open AnnaShaleva opened this issue 1 year ago • 8 comments

Current Behavior

Some blocks are uploaded to NeoFS, then restart of the script happens. The script starts to upload from 0 block, not from the latest incomplete batch:

2024-10-23 14:46:48.603	Chain block height: 6231784
2024-10-23 14:47:00.828	Processing batch from 0 to 9999
2024-10-23 14:47:00.828	First block of latest incomplete batch uploaded to NeoFS container: 0
2024-10-23 14:50:56.510	Processing batch from 10000 to 19999
2024-10-23 14:50:56.510	Successfully uploaded batch of blocks: from 0 to 9999
2024-10-23 14:55:19.178	Processing batch from 20000 to 29999
2024-10-23 14:55:19.178	Successfully uploaded batch of blocks: from 10000 to 19999
2024-10-23 15:00:07.874	Processing batch from 30000 to 39999
2024-10-23 15:00:07.874	Successfully uploaded batch of blocks: from 20000 to 29999
2024-10-23 15:03:35.567	Processing batch from 40000 to 49999
2024-10-23 15:03:35.567	Successfully uploaded batch of blocks: from 30000 to 39999
2024-10-23 15:04:49.432	Chain block height: 6231850
2024-10-23 15:07:34.927	Processing batch from 0 to 9999
2024-10-23 15:07:34.927	First block of latest incomplete batch uploaded to NeoFS container: 0
2024-10-23 15:12:00.468	Processing batch from 10000 to 19999
2024-10-23 15:12:00.468	Successfully uploaded batch of blocks: from 0 to 9999
2024-10-23 15:16:50.336	Processing batch from 20000 to 29999
2024-10-23 15:16:50.336	Successfully uploaded batch of blocks: from 10000 to 19999
2024-10-23 15:21:10.256	Processing batch from 30000 to 39999
2024-10-23 15:21:10.256	Successfully uploaded batch of blocks: from 20000 to 29999
2024-10-23 15:25:27.258	Processing batch from 40000 to 49999
2024-10-23 15:25:27.258	Successfully uploaded batch of blocks: from 30000 to 39999
2024-10-23 15:30:14.827	Processing batch from 50000 to 59999

The pattern repeats.

Expected Behavior

Reupload must happen starting from latest incomplete batch.

Possible Solution

Find the problem in fetchLatestMissingBlockIndex, fix it.

AnnaShaleva avatar Oct 24 '24 07:10 AnnaShaleva

One more example:

2024-10-24 06:10:41.740	
Successfully uploaded batch of blocks: from 1960000 to 1969999
2024-10-24 06:14:17.712	Processing batch from 1980000 to 1989999	
2024-10-24 06:14:17.712	Successfully uploaded batch of blocks: from 1970000 to 1979999	
2024-10-24 06:18:11.192	Processing batch from 1990000 to 1999999	
2024-10-24 06:18:11.192	Successfully uploaded batch of blocks: from 1980000 to 1989999	
2024-10-24 06:21:47.636	Processing batch from 2000000 to 2009999	
2024-10-24 06:21:47.636	Successfully uploaded batch of blocks: from 1990000 to 1999999	
2024-10-24 06:25:07.098	Processing batch from 2010000 to 2019999
2024-10-24 06:25:07.098	Successfully uploaded batch of blocks: from 2000000 to 2009999	
2024-10-24 06:29:13.781	Processing batch from 2020000 to 2029999	
2024-10-24 06:29:13.781	Successfully uploaded batch of blocks: from 2010000 to 2019999	
2024-10-24 06:33:18.434	Processing batch from 2030000 to 2039999	
2024-10-24 06:33:18.434	Successfully uploaded batch of blocks: from 2020000 to 2029999	
2024-10-24 06:36:05.190	upload error: failed to initiate object upload: connection: no healthy client
2024-10-24 06:37:08.145	Chain block height: 6235291	
2024-10-24 06:54:58.627	Processing batch from 0 to 9999	
2024-10-24 06:54:58.627	First block of latest incomplete batch uploaded to NeoFS container: 0	
2024-10-24 07:00:32.594	Processing batch from 10000 to 19999	
2024-10-24 07:00:32.594	Successfully uploaded batch of blocks: from 0 to 9999	
2024-10-24 07:05:19.672	Processing batch from 20000 to 29999	
2024-10-24 07:05:19.672	Successfully uploaded batch of blocks: from 10000 to 19999	
2024-10-24 07:08:32.479	Processing batch from 30000 to 39999	
2024-10-24 07:08:32.479	Successfully uploaded batch of blocks: from 20000 to 29999	
2024-10-24 07:10:25.644	Processing batch from 40000 to 49999	
2024-10-24 07:10:25.644	Successfully uploaded batch of blocks: from 30000 to 39999	
2024-10-24 07:10:48.870	upload error: failed to initiate object upload: connection: no healthy client	
2024-10-24 07:11:50.608	Chain block height: 6235419

AnnaShaleva avatar Oct 24 '24 07:10 AnnaShaleva

But sometimes it works differently (logs are from the same mainnet service):

2024-10-24 06:37:08.145	Chain block height: 6235291	
2024-10-24 06:54:58.627	Processing batch from 0 to 9999	
2024-10-24 06:54:58.627	First block of latest incomplete batch uploaded to NeoFS container: 0
2024-10-24 07:00:32.594	Processing batch from 10000 to 19999	
2024-10-24 07:00:32.594	Successfully uploaded batch of blocks: from 0 to 9999	
2024-10-24 07:05:19.672	Processing batch from 20000 to 29999	
2024-10-24 07:05:19.672	Successfully uploaded batch of blocks: from 10000 to 19999	
2024-10-24 07:08:32.479	Processing batch from 30000 to 39999	
2024-10-24 07:08:32.479	Successfully uploaded batch of blocks: from 20000 to 29999
2024-10-24 07:10:25.644	Processing batch from 40000 to 49999	
2024-10-24 07:10:25.644	Successfully uploaded batch of blocks: from 30000 to 39999	
2024-10-24 07:10:48.870	upload error: failed to initiate object upload: connection: no healthy client	
2024-10-24 07:11:50.608	Chain block height: 6235419	
2024-10-24 07:18:58.201	failed to fetch the latest missing block index from container: search of index files failed for batch with indexes from 2480000 to 2489999: failed to initiate object search: session: init session: status: code = 1024 message = connection to the RPC node has been lost	
2024-10-24 07:20:01.123	Chain block height: 6235449	
2024-10-24 07:29:07.555	Processing batch from 40000 to 49999	
2024-10-24 07:29:07.555	First block of latest incomplete batch uploaded to NeoFS container: 40000	
2024-10-24 07:29:45.742	Processing batch from 50000 to 59999	
2024-10-24 07:29:45.742	Successfully uploaded batch of blocks: from 40000 to 49999	
2024-10-24 07:30:14.433	Processing batch from 60000 to 69999	
2024-10-24 07:30:14.433	Successfully uploaded batch of blocks: from 50000 to 59999	
2024-10-24 07:31:04.556	Processing batch from 70000 to 79999	
2024-10-24 07:31:04.556	Successfully uploaded batch of blocks: from 60000 to 69999	
2024-10-24 07:31:42.554	upload error: failed to initiate object upload: connection: no healthy client
2024-10-24 07:32:44.920	Chain block height: 6235496

AnnaShaleva avatar Oct 24 '24 07:10 AnnaShaleva

can be connected #3615

AliceInHunterland avatar Oct 24 '24 07:10 AliceInHunterland

Well, of course it's a bug in fetchLatestMissingBlockIndex only if our batches are full and don't have gaps.

can be connected https://github.com/nspcc-dev/neo-go/issues/3615

Check currently uploaded data for N3 mainet. See if there are gaps in batches.

AnnaShaleva avatar Oct 24 '24 07:10 AnnaShaleva

Checked, depends on #3615 resolution.

AnnaShaleva avatar Oct 24 '24 09:10 AnnaShaleva

To resolve this issue, we need to adopt the SEARCH completeness marker once https://github.com/nspcc-dev/neofs-node/issues/2721 implemented.

AnnaShaleva avatar Oct 24 '24 10:10 AnnaShaleva

See also https://github.com/nspcc-dev/neofs-node/issues/2721#issuecomment-2435121374: for situations where all SNs from REP policy are dead, we need to shut down BlockFetcher, because it won't be able to receive even block OIDs and proceed with addition of blocks to the chain.

AnnaShaleva avatar Oct 24 '24 14:10 AnnaShaleva

Within the scope of this issue we need to revert changes made by https://github.com/nspcc-dev/neo-go/issues/3670 and fall back from per-object SEARCH to SEARCH for the range of objects.

AnnaShaleva avatar Nov 11 '24 09:11 AnnaShaleva