cortex
cortex copied to clipboard
Unshipped blocks when out of order writes are enabled
Describe the bug
Unshipped blocks are shown in the cortex_ingester_oldest_unshipped_block_timestamp_seconds
metric and are also visible in the ingester storage when out of order writes are enabled with the configuration introduced in #4964. Blocks are accumulating on the ingester as long as the config is set.
To Reproduce
- Start Cortex 1.15.2
- Allow out of order writes using newly introduced configuration parameters introduced in #4964
- Perform Write operations
Expected behavior
Expecting to see no unshipped blocks on the ingester and have the metric cortex_ingester_oldest_unshipped_block_timestamp_seconds
at value 0.
Environment
Additional Context
Tested with following two combinations of configurations and they produced the same result.
out_of_order_time_window: 30m
out_of_order_cap_max: 32
and
out_of_order_time_window: 30m
out_of_order_cap_max: 32
skip_blocks_with_out_of_order_chunks_enabled: true
Metrics:
cortex_ingester_shipper_uploads_total
shows that block uploads are being done
cortex_ingester_shipper_upload_failures_total
does not show any failures
cortex_compactor_runs_completed_total
shows that compactions are being done
cortex_compactor_runs_failed_total
shows no failed compactions
Logs:
There are no errors in Cortex component logs. Only logs that could point to something are ingester logs regarding blocks overlapping, for example
caller=compact.go:698 org_id=fake msg="Found overlapping blocks during compaction" ulid=01H2SSZF807FMM2FD52HFGA3N7
Alerts:
CortexIngesterHasUnshippedBlocks
alert from cortex-jsonnet is triggered as there are unshipped blocks available.
I was just about to raise this same bug
We have tested various values for out_of_order_time_window
including very short ones like 10m
and longer ones like 2w
and the problem persists.
I will try to take a look on this this week!
cc @yeya24
Is it because ingester shipper didn't upload compacted blocks? https://github.com/cortexproject/cortex/blob/master/pkg/ingester/ingester.go#L2031
Also raised https://github.com/thanos-io/thanos/issues/6462 on Thanos side. I think make shipper upload compacted blocks works, but it might cause other issues (since we cannot identify compacted blocks generated by OOO or others)
Is there anything outstanding that's blocking the merge still?
@disambiguationuk I think we can merge this now https://github.com/cortexproject/cortex/pull/5416, I just need to rebase and resolve conflicts.
With this change https://github.com/cortexproject/cortex/pull/5495/files#diff-e1032332627c413a3010c66b54b22b6e9835cf152fa339e40cf0b11204f7241fR2043 we should be able to upload dynamically
Any updates on this fix, is it still being worked on?
Hi @AmerSelimovic, sorry for the delay. The fix should be ready but I want to see if I can verify it first in our testing environment. I should get it done this week.
And if you are willing to test some prebuilt image, it would be very helpful
@AmerSelimovic Actually I believe the bug is already fixed. If the tenant has OOO time window > 0 enabled, shipper should upload compacted blocks.
What we are trying to add in https://github.com/cortexproject/cortex/pull/5416 is to turn on/off shipper uploading compacted blocks dynamically in case OOO feature is enabled/disabled during runtime. If OOO is enabled when ingester starts, all blocks can be uploaded successfully.
Hi @yeya24.
Not sure what do you propose fixed the reported bug?
Because issues were also happening with out_of_order_time_window: 30m
You think it is okay with this change https://github.com/cortexproject/cortex/pull/5495/files#diff-e1032332627c413a3010c66b54b22b6e9835cf152fa339e40cf0b11204f7241fR2043
The fix is to always upload compacted blocks in ingester so OOO compacted blocks can be uploaded to object store
Btw https://github.com/cortexproject/cortex/releases/tag/v1.16.0-rc.0 is out. Feel free to try it out and see if it fixes this issue
https://github.com/cortexproject/cortex/releases/tag/v1.17.0-rc.0 is out. It should address this issue completely as overlapped blocks will not be compacted by Prometheus anymore. Compactor will handle that.