redpanda
redpanda copied to clipboard
WARN: Failed to make upload candidate
Version & Environment
Redpanda version: v23.1.1
What went wrong?
The following warning shows in the log:
WARN 2023-03-16 18:45:31,586 [shard 7] archival - [fiber47 kafka/topic1/216] - ntp_archiver_service.cc:1489 - Failed to make upload candidate
What should have happened instead?
No warnings without external causes that we can't avoid.
JIRA Link: CORE-1214
Also noticed this today on the long running test cluster (it only got upgraded to 23.1 a day or so ago).
No sign of it having an impact (uploads appeared to be proceeding eventually), Evgeny suggested that it could happen in situations where the retention code is racing with the upload code, but it has been seen on topics that had infinite retention.
Sounds like something we should target for next minor, then?
This has started to come up in automated tests.
FAIL test: KgoVerifierWithSiTestLargeSegments.test_si_with_timeboxed.cloud_storage_type=CloudStorageType.S3 (1/2 runs) failure at 2023-05-04T17:16:24.037Z: <BadLogLines nodes=ip-172-31-9-183(2) example="ERROR 2023-05-04 10:40:27,356 [shard 1] archival - [fiber67 kafka/topic-rklshggkhy/83] - ntp_archiver_service.cc:2060 - Failed to make upload candidate with correct size, expected {source segment offsets: {term:2, base_offset:2, committed_offset:114, dirty_offset:114}, exposed_name: {2-2-v1.log}, starting_offset: 2, file_offset: 0, content_length: 109016389, final_offset: 114, final_file_offset: 109016389, term: 2, source names: {/var/lib/redpanda/data/kafka/topic-rklshggkhy/83_18/2-2-v1.log}}, actual {is_compacted: false, size_bytes: 25003650, base_offset: 90, committed_offset: 114, base_timestamp: {timestamp: 1683196755282}, max_timestamp: {timestamp: 1683196762533}, delta_offset: 6, ntp_revision: 18, archiver_term: 2, segment_term: 2, delta_offset_end: 6, sname_format: {v3}, metadata_size_hint: 0}"> on (amd64, VM) in job https://buildkite.com/redpanda/vtools/builds/7377#0187e5cb-c7ac-45a5-a783-89ebbe7df193
FAIL test: CloudRetentionTest.test_cloud_retention.max_consume_rate_mb=20.cloud_storage_type=CloudStorageType.S3 (1/31 runs) failure at 2023-05-05T04:45:27.129Z: <BadLogLines nodes=ip-172-31-4-15(1) example="ERROR 2023-05-05 00:13:10,861 [shard 1] archival - [fiber29 kafka/si_test_topic/3] - ntp_archiver_service.cc:2060 - Failed to make upload candidate with correct size, expected {source segment offsets: {term:2, base_offset:2325, committed_offset:2411, dirty_offset:2411}, exposed_name: {2325-2-v1.log}, starting_offset: 2325, file_offset: 0, content_length: 0, final_offset: -9223372036854775808, final_file_offset: 0, term: 2, source names: {/var/lib/redpanda/data/kafka/si_test_topic/3_18/2325-2-v1.log}}, actual {is_compacted: false, size_bytes: 9970066, base_offset: 2245, committed_offset: 2324, base_timestamp: {timestamp: 1683245577314}, max_timestamp: {timestamp: 1683245579310}, delta_offset: 65, ntp_revision: 18, archiver_term: 2, segment_term: 1, delta_offset_end: 66, sname_format: {v3}, metadata_size_hint: 0}"> on (arm64, VM) in job https://buildkite.com/redpanda/vtools/builds/7385#0187e85f-7419-4984-9d70-bec9324a05ac
Created https://github.com/redpanda-data/redpanda/issues/10583 so that it easier for PR authors to find and refer to the issue.
for me it looks like the issue we are seeing in the mentioned tests and the one mentioned in the issue title are two different problems. Then one that started happening in the mentioned tests is related with the size mismatch, whereas the problem seen in the PoC is more general as it hits the case where there is no upload candidate at all.
This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.
This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.
This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.