Flaky TestQuerierWithBlocksStorageRunningInSingleBinaryMode
Every while and then I see that TestQuerierWithBlocksStorageRunningInSingleBinaryMode integration test is flaky (eg. this CI):
=== RUN TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_ingester_gRPC_streaming_disabled,_inmemory_index_cache
08:05:54 Starting consul
08:05:55 consul: ==> Starting Consul agent...
08:05:55 consul: ==> Consul agent running!
08:05:55 consul: Version: 'v0.9.4'
08:05:55 consul: Node ID: 'a6770ff6-b258-8634-842f-dddfd5547b0a'
08:05:55 consul: Node name: 'consul'
08:05:55 consul: Datacenter: 'dc1' (Segment: '<all>')
08:05:55 consul: Server: true (Bootstrap: false)
08:05:55 consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
08:05:55 consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
08:05:55 consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
08:05:55 consul: ==> Log data will now stream in as it occurs:
08:05:55 Ports for container: e2e-cortex-test-consul Mapping: map[8500:33044]
08:05:55 Starting minio-9000
08:05:56 Ports for container: e2e-cortex-test-minio-9000 Mapping: map[9000:33045]
08:05:56 Starting memcached
08:05:56 minio-9000: Attempting encryption of all config, IAM users and policies on MinIO backend
08:05:58 Ports for container: e2e-cortex-test-memcached Mapping: map[11211:33046]
08:05:58 Starting cortex-1
08:05:59 cortex-1: level=warn ts=2020-07-08T08:05:59.301263543Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:05:59 cortex-1: level=warn ts=2020-07-08T08:05:59.301373647Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:05:59 cortex-1: level=warn ts=2020-07-08T08:05:59.303325568Z caller=modules.go:190 msg="Worker address is empty in single binary mode. Attempting automatic worker configuration. If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
08:05:59 Ports for container: e2e-cortex-test-cortex-1 Mapping: map[80:33048 9095:33047]
08:05:59 Starting cortex-2
08:06:00 cortex-2: level=warn ts=2020-07-08T08:06:00.385882716Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:06:00 cortex-2: level=warn ts=2020-07-08T08:06:00.386608276Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:06:00 cortex-2: level=warn ts=2020-07-08T08:06:00.404054134Z caller=modules.go:190 msg="Worker address is empty in single binary mode. Attempting automatic worker configuration. If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
08:06:00 Ports for container: e2e-cortex-test-cortex-2 Mapping: map[80:33050 9095:33049]
08:06:05 cortex-2: level=warn ts=2020-07-08T08:06:05.442784029Z caller=blocks_scanner.go:406 msg="found partial blocks" user=user-1 blocks=01ECPQBNVW47E4D6ZYR2PENZR3 err="meta.json not found"
08:06:07 cortex-2: level=warn ts=2020-07-08T08:06:07.448660828Z caller=blocks_scanner.go:406 msg="found partial blocks" user=user-1 blocks=01ECPQBQTCMQTW19WXS4TRPBX5 err="meta.json not found"
08:06:08 consul: ==> Newer Consul version available: 1.8.0 (currently running: 0.9.4)
TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_ingester_gRPC_streaming_disabled,_inmemory_index_cache: querier_test.go:369:
Error Trace: querier_test.go:369
Error: Received unexpected error:
unable to find metrics [thanos_store_index_cache_requests_total] with expected values. Last values: [11]
Test: TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_ingester_gRPC_streaming_disabled,_inmemory_index_cache
08:06:33 Killing cortex-2
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.583550354Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=30&stale=&wait=10000ms\": context canceled"
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.583798103Z caller=worker_frontend_manager.go:96 msg="error processing requests" err="rpc error: code = Canceled desc = context canceled"
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.584304101Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=30&stale=&wait=10000ms\": context canceled"
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.584473392Z caller=client.go:215 msg="error getting path" key=collectors/ring err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/ring?index=31&stale=&wait=10000ms\": context canceled"
08:06:33 cortex-2: level=warn ts=2020-07-08T08:06:33.596942606Z caller=transfer.go:431 msg="transfer attempt failed" err="cannot find ingester to transfer blocks to: no pending ingesters" attempt=1 max_retries=10
08:06:34 Killing cortex-1
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.101657012Z caller=worker_frontend_manager.go:96 msg="error processing requests" err="rpc error: code = Canceled desc = context canceled"
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.101929695Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=34&stale=&wait=10000ms\": context canceled"
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.102637776Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=34&stale=&wait=10000ms\": context canceled"
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.102801363Z caller=client.go:215 msg="error getting path" key=collectors/ring err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/ring?index=32&stale=&wait=10000ms\": context canceled"
08:06:34 cortex-1: level=warn ts=2020-07-08T08:06:34.107613863Z caller=transfer.go:431 msg="transfer attempt failed" err="cannot find ingester to transfer blocks to: no pending ingesters" attempt=1 max_retries=10
08:06:34 Killing memcached
08:06:34 memcached: Signal handled: Terminated.
08:06:34 Killing minio-9000
08:06:34 minio-9000: Exiting on signal: TERMINATED
08:06:35 Killing consul
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
Still valid
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
I think I'm not seeing it flaky since some time. Let me close it: we can reopen it anytime if we notice the issue again.
Seeing this issue: https://github.com/cortexproject/cortex/actions/runs/13570266489/job/37933465117
unable to find metrics [cortex_blocks_meta_synced] with expected values. Last error: <nil>. Last values: [12]
=== RUN TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_memcached_index_cache,thanosEngine=true
15:59:38 Starting consul
15:59:38 consul: ==> Starting Consul agent...
15:59:38 consul: Version: '1.8.4'
15:59:38 consul: Node ID: '04b73037-72c5-9820-685c-615a9f11818b'
15:59:38 consul: Node name: 'consul'
15:59:38 consul: Datacenter: 'dc1' (Segment: '<all>')
15:59:38 consul: Server: true (Bootstrap: false)
15:59:38 consul: Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
15:59:38 consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
15:59:38 consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
15:59:38 consul: ==> Log data will now stream in as it occurs:
15:59:38 consul: ==> Consul agent running!
15:59:38 Ports for container: e2e-cortex-test-consul Mapping: map[8500:33454]
15:59:38 Starting minio-9000
15:59:39 Ports for container: e2e-cortex-test-minio-9000 Mapping: map[9000:33455]
15:59:39 Starting memcached
15:59:39 Ports for container: e2e-cortex-test-memcached Mapping: map[11211:33456]
15:59:39 Starting redis
15:59:40 redis: 1:C 27 Feb 2025 15:59:40.103 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
15:59:40 redis: 1:C 27 Feb 2025 15:59:40.103 # Redis version=7.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
15:59:40 redis: 1:C 27 Feb 2025 15:59:40.103 # Configuration loaded
15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 * monotonic clock: POSIX clock_gettime
15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 * Running mode=standalone, port=6379.
15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 # Server initialized
15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
15:59:40 redis: 1:M 27 Feb 2025 15:59:40.105 * Ready to accept connections
15:59:40 Ports for container: e2e-cortex-test-redis Mapping: map[6379:33457]
15:59:40 Starting cortex-1
15:59:41 cortex-1: ts=2025-02-27T15:59:41.306517718Z caller=modules.go:364 level=warn msg="Worker address is empty in single binary mode. Attempting automatic worker configuration. If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
15:59:41 Ports for container: e2e-cortex-test-cortex-1 Mapping: map[80:33459 9095:33458]
15:59:41 Starting cortex-2
15:59:41 cortex-2: ts=2025-02-27T15:59:41.736507265Z caller=modules.go:364 level=warn msg="Worker address is empty in single binary mode. Attempting automatic worker configuration. If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
15:59:42 Ports for container: e2e-cortex-test-cortex-2 Mapping: map[80:33461 9095:33460]
15:59:44 cortex-1: ts=2025-02-27T15:59:44.329303704Z caller=shipper.go:259 level=warn org_id=user-1 msg="reading meta file failed, will override it" err="failed to read tsdb/user-1/thanos.shipper.json: open tsdb/user-1/thanos.shipper.json: no such file or directory"
15:59:44 cortex-2: ts=2025-02-27T15:59:44.749456642Z caller=shipper.go:259 level=warn org_id=user-1 msg="reading meta file failed, will override it" err="failed to read tsdb/user-1/thanos.shipper.json: open tsdb/user-1/thanos.shipper.json: no such file or directory"
querier_test.go:503:
Error Trace: /home/runner/work/cortex/cortex/integration/querier_test.go:503
Error: Received unexpected error:
unable to find metrics [cortex_blocks_meta_synced] with expected values. Last error: <nil>. Last values: [12]
Test: TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_memcached_index_cache,thanosEngine=true
16:00:11 Killing cortex-2
16:00:12 Error response from daemon: No such container: e2e-cortex-test-cortex-2
16:00:12 Killing cortex-1
16:00:12 Error response from daemon: No such container: e2e-cortex-test-cortex-1
16:00:12 Killing redis
16:00:12 Error response from daemon: No such container: e2e-cortex-test-redis
16:00:12 Killing memcached
16:00:12 Error response from daemon: No such container: e2e-cortex-test-memcached
16:00:12 Killing minio-9000
16:00:12 Error response from daemon: No such container: e2e-cortex-test-minio-9000
16:00:12 Killing consul
16:00:13 Error response from daemon: No such container: e2e-cortex-test-consul
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.