cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Flaky TestQuerierWithBlocksStorageRunningInSingleBinaryMode

Open pracucci opened this issue 5 years ago • 6 comments

Every while and then I see that TestQuerierWithBlocksStorageRunningInSingleBinaryMode integration test is flaky (eg. this CI):

=== RUN   TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_ingester_gRPC_streaming_disabled,_inmemory_index_cache
08:05:54 Starting consul
08:05:55 consul: ==> Starting Consul agent...
08:05:55 consul: ==> Consul agent running!
08:05:55 consul: Version: 'v0.9.4'
08:05:55 consul: Node ID: 'a6770ff6-b258-8634-842f-dddfd5547b0a'
08:05:55 consul: Node name: 'consul'
08:05:55 consul: Datacenter: 'dc1' (Segment: '<all>')
08:05:55 consul: Server: true (Bootstrap: false)
08:05:55 consul: Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600)
08:05:55 consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
08:05:55 consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
08:05:55 consul: ==> Log data will now stream in as it occurs:
08:05:55 Ports for container: e2e-cortex-test-consul Mapping: map[8500:33044]
08:05:55 Starting minio-9000
08:05:56 Ports for container: e2e-cortex-test-minio-9000 Mapping: map[9000:33045]
08:05:56 Starting memcached
08:05:56 minio-9000: Attempting encryption of all config, IAM users and policies on MinIO backend
08:05:58 Ports for container: e2e-cortex-test-memcached Mapping: map[11211:33046]
08:05:58 Starting cortex-1
08:05:59 cortex-1: level=warn ts=2020-07-08T08:05:59.301263543Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:05:59 cortex-1: level=warn ts=2020-07-08T08:05:59.301373647Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:05:59 cortex-1: level=warn ts=2020-07-08T08:05:59.303325568Z caller=modules.go:190 msg="Worker address is empty in single binary mode.  Attempting automatic worker configuration.  If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
08:05:59 Ports for container: e2e-cortex-test-cortex-1 Mapping: map[80:33048 9095:33047]
08:05:59 Starting cortex-2
08:06:00 cortex-2: level=warn ts=2020-07-08T08:06:00.385882716Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:06:00 cortex-2: level=warn ts=2020-07-08T08:06:00.386608276Z caller=experimental.go:19 msg="experimental feature in use" feature="Blocks storage engine"
08:06:00 cortex-2: level=warn ts=2020-07-08T08:06:00.404054134Z caller=modules.go:190 msg="Worker address is empty in single binary mode.  Attempting automatic worker configuration.  If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095
08:06:00 Ports for container: e2e-cortex-test-cortex-2 Mapping: map[80:33050 9095:33049]
08:06:05 cortex-2: level=warn ts=2020-07-08T08:06:05.442784029Z caller=blocks_scanner.go:406 msg="found partial blocks" user=user-1 blocks=01ECPQBNVW47E4D6ZYR2PENZR3 err="meta.json not found"
08:06:07 cortex-2: level=warn ts=2020-07-08T08:06:07.448660828Z caller=blocks_scanner.go:406 msg="found partial blocks" user=user-1 blocks=01ECPQBQTCMQTW19WXS4TRPBX5 err="meta.json not found"
08:06:08 consul: ==> Newer Consul version available: 1.8.0 (currently running: 0.9.4)
    TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_ingester_gRPC_streaming_disabled,_inmemory_index_cache: querier_test.go:369: 
        	Error Trace:	querier_test.go:369
        	Error:      	Received unexpected error:
        	            	unable to find metrics [thanos_store_index_cache_requests_total] with expected values. Last values: [11]
        	Test:       	TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_ingester_gRPC_streaming_disabled,_inmemory_index_cache
08:06:33 Killing cortex-2
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.583550354Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=30&stale=&wait=10000ms\": context canceled"
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.583798103Z caller=worker_frontend_manager.go:96 msg="error processing requests" err="rpc error: code = Canceled desc = context canceled"
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.584304101Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=30&stale=&wait=10000ms\": context canceled"
08:06:33 cortex-2: level=error ts=2020-07-08T08:06:33.584473392Z caller=client.go:215 msg="error getting path" key=collectors/ring err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/ring?index=31&stale=&wait=10000ms\": context canceled"
08:06:33 cortex-2: level=warn ts=2020-07-08T08:06:33.596942606Z caller=transfer.go:431 msg="transfer attempt failed" err="cannot find ingester to transfer blocks to: no pending ingesters" attempt=1 max_retries=10
08:06:34 Killing cortex-1
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.101657012Z caller=worker_frontend_manager.go:96 msg="error processing requests" err="rpc error: code = Canceled desc = context canceled"
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.101929695Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=34&stale=&wait=10000ms\": context canceled"
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.102637776Z caller=client.go:215 msg="error getting path" key=collectors/store-gateway err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/store-gateway?index=34&stale=&wait=10000ms\": context canceled"
08:06:34 cortex-1: level=error ts=2020-07-08T08:06:34.102801363Z caller=client.go:215 msg="error getting path" key=collectors/ring err="Get \"http://e2e-cortex-test-consul:8500/v1/kv/collectors/ring?index=32&stale=&wait=10000ms\": context canceled"
08:06:34 cortex-1: level=warn ts=2020-07-08T08:06:34.107613863Z caller=transfer.go:431 msg="transfer attempt failed" err="cannot find ingester to transfer blocks to: no pending ingesters" attempt=1 max_retries=10
08:06:34 Killing memcached
08:06:34 memcached: Signal handled: Terminated.
08:06:34 Killing minio-9000
08:06:34 minio-9000: Exiting on signal: TERMINATED
08:06:35 Killing consul

pracucci avatar Jul 08 '20 15:07 pracucci

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 06 '20 15:09 stale[bot]

Still valid

pracucci avatar Sep 14 '20 09:09 pracucci

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 13 '20 16:11 stale[bot]

I think I'm not seeing it flaky since some time. Let me close it: we can reopen it anytime if we notice the issue again.

pracucci avatar Nov 14 '20 14:11 pracucci

Seeing this issue: https://github.com/cortexproject/cortex/actions/runs/13570266489/job/37933465117

unable to find metrics [cortex_blocks_meta_synced] with expected values. Last error: <nil>. Last values: [12]
=== RUN   TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_memcached_index_cache,thanosEngine=true

15:59:38 Starting consul

15:59:38 consul: ==> Starting Consul agent...

15:59:38 consul: Version: '1.8.4'

15:59:38 consul: Node ID: '04b73037-72c5-9820-685c-615a9f11818b'

15:59:38 consul: Node name: 'consul'

15:59:38 consul: Datacenter: 'dc1' (Segment: '<all>')

15:59:38 consul: Server: true (Bootstrap: false)

15:59:38 consul: Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)

15:59:38 consul: Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)

15:59:38 consul: Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

15:59:38 consul: ==> Log data will now stream in as it occurs:

15:59:38 consul: ==> Consul agent running!

15:59:38 Ports for container: e2e-cortex-test-consul Mapping: map[8500:33454]

15:59:38 Starting minio-9000

15:59:39 Ports for container: e2e-cortex-test-minio-9000 Mapping: map[9000:33455]

15:59:39 Starting memcached

15:59:39 Ports for container: e2e-cortex-test-memcached Mapping: map[11211:33456]

15:59:39 Starting redis

15:59:40 redis: 1:C 27 Feb 2025 15:59:40.103 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

15:59:40 redis: 1:C 27 Feb 2025 15:59:40.103 # Redis version=7.0.4, bits=64, commit=00000000, modified=0, pid=1, just started

15:59:40 redis: 1:C 27 Feb 2025 15:59:40.103 # Configuration loaded

15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 * monotonic clock: POSIX clock_gettime

15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 * Running mode=standalone, port=6379.

15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 # Server initialized

15:59:40 redis: 1:M 27 Feb 2025 15:59:40.104 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

15:59:40 redis: 1:M 27 Feb 2025 15:59:40.105 * Ready to accept connections

15:59:40 Ports for container: e2e-cortex-test-redis Mapping: map[6379:33457]

15:59:40 Starting cortex-1

15:59:41 cortex-1: ts=2025-02-27T15:59:41.306517718Z caller=modules.go:364 level=warn msg="Worker address is empty in single binary mode.  Attempting automatic worker configuration.  If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095

15:59:41 Ports for container: e2e-cortex-test-cortex-1 Mapping: map[80:33459 9095:33458]

15:59:41 Starting cortex-2

15:59:41 cortex-2: ts=2025-02-27T15:59:41.736507265Z caller=modules.go:364 level=warn msg="Worker address is empty in single binary mode.  Attempting automatic worker configuration.  If queries are unresponsive consider configuring the worker explicitly." address=127.0.0.1:9095

15:59:42 Ports for container: e2e-cortex-test-cortex-2 Mapping: map[80:33461 9095:33460]

15:59:44 cortex-1: ts=2025-02-27T15:59:44.329303704Z caller=shipper.go:259 level=warn org_id=user-1 msg="reading meta file failed, will override it" err="failed to read tsdb/user-1/thanos.shipper.json: open tsdb/user-1/thanos.shipper.json: no such file or directory"

15:59:44 cortex-2: ts=2025-02-27T15:59:44.749456642Z caller=shipper.go:259 level=warn org_id=user-1 msg="reading meta file failed, will override it" err="failed to read tsdb/user-1/thanos.shipper.json: open tsdb/user-1/thanos.shipper.json: no such file or directory"

    querier_test.go:503: 

        	Error Trace:	/home/runner/work/cortex/cortex/integration/querier_test.go:503

        	Error:      	Received unexpected error:

        	            	unable to find metrics [cortex_blocks_meta_synced] with expected values. Last error: <nil>. Last values: [12]

        	Test:       	TestQuerierWithBlocksStorageRunningInSingleBinaryMode/blocks_sharding_enabled,_memcached_index_cache,thanosEngine=true

16:00:11 Killing cortex-2

16:00:12 Error response from daemon: No such container: e2e-cortex-test-cortex-2



16:00:12 Killing cortex-1

16:00:12 Error response from daemon: No such container: e2e-cortex-test-cortex-1



16:00:12 Killing redis

16:00:12 Error response from daemon: No such container: e2e-cortex-test-redis



16:00:12 Killing memcached

16:00:12 Error response from daemon: No such container: e2e-cortex-test-memcached



16:00:12 Killing minio-9000

16:00:12 Error response from daemon: No such container: e2e-cortex-test-minio-9000



16:00:12 Killing consul

16:00:13 Error response from daemon: No such container: e2e-cortex-test-consul

CharlieTLe avatar Feb 27 '25 16:02 CharlieTLe

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 19 '25 06:07 stale[bot]