dragonfly
dragonfly copied to clipboard
Failed to pull DflyConnection from remote_ready_queue
Describe the bug Tonight we got a crash problem on dragonfly instance. This issue may be related to previously reported issue: https://github.com/dragonflydb/dragonfly/issues/2599
dragonfly-1 | I20240222 00:44:07.197453 13 save_stages_controller.cc:283] Saving "dump-2024-02-22T00:44:00-summary.dfs" finished after 7 s
dragonfly-1 | W20240222 00:44:08.000803 8 transaction.cc:58] TxQueue is too long. Tx count:174, armed:93, runnable:0, total locks: 19, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1089489810 continuation_tx: BRPOPLPUSH@967305287/2 {id=6892} armed
dragonfly-1 | W20240222 00:44:09.000679 16 transaction.cc:58] TxQueue is too long. Tx count:166, armed:75, runnable:0, total locks: 18, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1023973961
dragonfly-1 | W20240222 00:44:09.000916 15 transaction.cc:58] TxQueue is too long. Tx count:176, armed:86, runnable:0, total locks: 15, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:752259126
dragonfly-1 | W20240222 00:44:09.000931 14 transaction.cc:58] TxQueue is too long. Tx count:172, armed:78, runnable:0, total locks: 14, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:922722963
dragonfly-1 | W20240222 00:44:09.000897 13 transaction.cc:58] TxQueue is too long. Tx count:160, armed:66, runnable:0, total locks: 21, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:971144512
dragonfly-1 | W20240222 00:44:09.000893 10 transaction.cc:58] TxQueue is too long. Tx count:126, armed:35, runnable:0, total locks: 24, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:912110770 continuation_tx: BRPOPLPUSH@967310621/2 {id=14743} armed
dragonfly-1 | W20240222 00:44:09.000903 9 transaction.cc:58] TxQueue is too long. Tx count:144, armed:54, runnable:0, total locks: 17, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:801155636
dragonfly-1 | W20240222 00:44:09.000910 12 transaction.cc:58] TxQueue is too long. Tx count:173, armed:83, runnable:0, total locks: 18, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:901691102 continuation_tx: BRPOPLPUSH@967310620/2 {id=14596} armed
dragonfly-1 | W20240222 00:44:09.000947 17 transaction.cc:58] TxQueue is too long. Tx count:206, armed:114, runnable:0, total locks: 16, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1018598879 continuation_tx: BRPOPLPUSH@967310620/2 {id=14596} armed
dragonfly-1 | W20240222 00:44:11.000659 11 transaction.cc:58] TxQueue is too long. Tx count:171, armed:73, runnable:0, total locks: 14, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:837878273 continuation_tx: BRPOPLPUSH@967319900/2 {id=26317}
dragonfly-1 | W20240222 00:44:18.001056 8 transaction.cc:58] TxQueue is too long. Tx count:180, armed:89, runnable:0, total locks: 19, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1089548305 continuation_tx: ZREM@967352863/1:9 {id=28558}
dragonfly-1 | W20240222 00:44:19.000793 14 transaction.cc:58] TxQueue is too long. Tx count:131, armed:46, runnable:0, total locks: 14, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:922770119 continuation_tx: RPOPLPUSH@967358211/2:7 {id=65019}
dragonfly-1 | W20240222 00:44:19.000793 9 transaction.cc:58] TxQueue is too long. Tx count:144, armed:54, runnable:0, total locks: 17, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:801203048
dragonfly-1 | W20240222 00:44:19.000875 10 transaction.cc:58] TxQueue is too long. Tx count:114, armed:24, runnable:0, total locks: 24, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:912155384 continuation_tx: ZRANGEBYSCORE@967358212/1:1 {id=1675}
dragonfly-1 | W20240222 00:44:19.000793 13 transaction.cc:58] TxQueue is too long. Tx count:125, armed:33, runnable:0, total locks: 21, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:971192000
dragonfly-1 | W20240222 00:44:19.000814 17 transaction.cc:58] TxQueue is too long. Tx count:176, armed:86, runnable:0, total locks: 16, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1018648615
dragonfly-1 | W20240222 00:44:19.000815 16 transaction.cc:58] TxQueue is too long. Tx count:140, armed:46, runnable:0, total locks: 18, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1024025187
dragonfly-1 | W20240222 00:44:19.000815 12 transaction.cc:58] TxQueue is too long. Tx count:147, armed:57, runnable:0, total locks: 18, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:901742488
dragonfly-1 | W20240222 00:44:19.000799 15 transaction.cc:58] TxQueue is too long. Tx count:140, armed:45, runnable:0, total locks: 15, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:752297090
dragonfly-1 | W20240222 00:44:21.002529 11 transaction.cc:58] TxQueue is too long. Tx count:97, armed:28, runnable:0, total locks: 14, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:837920993
dragonfly-1 | W20240222 00:44:28.000692 8 transaction.cc:58] TxQueue is too long. Tx count:168, armed:78, runnable:0, total locks: 19, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1089607535
dragonfly-1 | W20240222 00:44:29.000649 17 transaction.cc:58] TxQueue is too long. Tx count:174, armed:78, runnable:0, total locks: 16, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1018702120
dragonfly-1 | W20240222 00:44:29.000862 9 transaction.cc:58] TxQueue is too long. Tx count:133, armed:50, runnable:0, total locks: 17, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:801249272
dragonfly-1 | W20240222 00:44:29.000885 16 transaction.cc:58] TxQueue is too long. Tx count:131, armed:37, runnable:0, total locks: 18, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:1024080133
dragonfly-1 | W20240222 00:44:29.001067 14 transaction.cc:58] TxQueue is too long. Tx count:133, armed:68, runnable:0, total locks: 14, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:922825635
dragonfly-1 | W20240222 00:44:29.001266 12 transaction.cc:58] TxQueue is too long. Tx count:151, armed:62, runnable:0, total locks: 18, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:901794357 continuation_tx: RPOPLPUSH@967410848/2:7 {id=42178}
dragonfly-1 | W20240222 00:44:29.001277 13 transaction.cc:58] TxQueue is too long. Tx count:134, armed:42, runnable:0, total locks: 21, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:971243770
dragonfly-1 | W20240222 00:44:29.001331 15 transaction.cc:58] TxQueue is too long. Tx count:145, armed:47, runnable:0, total locks: 15, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:752339155
dragonfly-1 | W20240222 00:44:29.001358 10 transaction.cc:58] TxQueue is too long. Tx count:125, armed:27, runnable:0, total locks: 24, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:912208224
dragonfly-1 | W20240222 00:44:31.000830 11 transaction.cc:58] TxQueue is too long. Tx count:118, armed:23, runnable:0, total locks: 14, contended locks: 0
dragonfly-1 | max contention score: 0, lock: , poll_executions:837968323
dragonfly-1 | F20240222 00:44:34.094962 11 fiber_interface.cc:303] Failed to pull DflyConnection from remote_ready_queue. res=0 remotenext: 140703315138560 remotempty: 0
dragonfly-1 | * Check failure stack trace: *
dragonfly-1 | @ 0x5578a0569a03 google::logmessage::SendToLog()
dragonfly-1 | @ 0x5578a05621c7 google::logmessage::Flush()
dragonfly-1 | @ 0x5578a0563b4f google::logmessagefatal::~LogMessageFatal()
dragonfly-1 | @ 0x5578a0337ed5 util::fb2::detail::fiberinterface::PullMyselfFromRemoteReadyQueue()
dragonfly-1 | @ 0x5578a03301d4 util::fb2::eventcount::wait_until()
dragonfly-1 | @ 0x55789fe78a8b dfly::transaction::batonbarrier::Wait()
dragonfly-1 | @ 0x55789fe7c4c0 dfly::transaction::WaitOnWatch()
dragonfly-1 | @ 0x55789fe17e14 dfly::(anonymous namespace)::bpoppusher::Run()
dragonfly-1 | @ 0x55789fe18256 dfly::(anonymous namespace)::BRPopLPush()
dragonfly-1 | @ 0x55789fe6486f dfly::commandid::Invoke()
dragonfly-1 | @ 0x55789fc7020c dfly::service::InvokeCmd()
dragonfly-1 | @ 0x55789fc74d71 dfly::service::DispatchCommand()
dragonfly-1 | @ 0x55789ff53f3d facade::connection::DispatchCommand()
dragonfly-1 | @ 0x55789ff54164 facade::connection::ParseRedis()
dragonfly-1 | @ 0x55789ff5658f facade::connection::IoLoop()
dragonfly-1 | @ 0x55789ff56a83 facade::connection::ConnectionFlow()
dragonfly-1 | @ 0x55789ff57a5f facade::connection::HandleRequests()
dragonfly-1 | @ 0x5578a034351d util::listenerinterface::RunSingleConnection()
dragonfly-1 | @ 0x5578a0343995 _ZN5boost7context6detail11fiber_entryINS1_12fiber_recordINS0_5fiberENS0_21basic_fixedsize_stackINS0_12stack_traitsEEEZN4util3fb26detail15WorkerFiberImplIZNS8_17ListenerInterface13RunAcceptLoopEvEUlvE0_JEEC4IS7_EESt17basic_string_viewIcSt11char_traitsIcEERKNS0_12preallocatedEOT_OSD_EUlOS4_E_EEEEvNS1_10transfer_tE
dragonfly-1 | @ 0x5578a038686f make_fcontext
dragonfly-1 | * SIGABRT received at time=1708562674 on cpu 3 *
dragonfly-1 | PC: @ 0x7ff87fb6800b (unknown) raise
dragonfly-1 | [failure_signal_handler.cc : 345] RAW: Signal 11 raised at PC=0x7ff87fb47941 while already in AbslFailureSignalHandler()
dragonfly-1 | * SIGSEGV received at time=1708562674 on cpu 3 *
dragonfly-1 | PC: @ 0x7ff87fb47941 (unknown) abort
dragonfly-1 | I20240222 00:45:11.232707 1 init.cc:70] dragonfly running in opt mode.
dragonfly-1 | I20240222 00:45:11.232919 1 dfly_main.cc:627] Starting dragonfly df-1e06c63-1e06c63727ffe84d2b4f9c36c6d5122082f52ae0
dragonfly-1 | I20240222 00:45:11.233206 1 dfly_main.cc:671] maxmemory has not been specified. Deciding myself....
dragonfly-1 | I20240222 00:45:11.233234 1 dfly_main.cc:680] Found 82.87GiB available memory. Setting maxmemory to 66.30GiB
dragonfly-1 | W20240222 00:45:11.233429 1 dfly_main.cc:354] Weird error 1 switching to epoll
dragonfly-1 | * Logs will be written to the first available of the following paths:
dragonfly-1 | /tmp/dragonfly.*
dragonfly-1 | ./dragonfly.*
dragonfly-1 | * For the available flags type dragonfly [--help | --helpfull]
dragonfly-1 | * Documentation can be found at: https://www.dragonflydb.io/docs
dragonfly-1 | I20240222 00:45:11.311951 1 proactor_pool.cc:146] Running 10 io threads
dragonfly-1 | I20240222 00:45:11.325162 1 server_family.cc:659] Host OS: Linux 5.10.0-28-amd64 x86_64 with 10 threads
dragonfly-1 | I20240222 00:45:11.327026 1 snapshot_storage.cc:108] Load snapshot: Searching for snapshot in directory: "/data"
dragonfly-1 | I20240222 00:45:11.330243 1 server_family.cc:822] Loading /data/dump-2024-02-22T00:44:00-summary.dfs
dragonfly-1 | I20240222 00:45:11.331840 9 listener_interface.cc:101] sock[23] AcceptServer - listening on port 6379
dragonfly-1 | I20240222 00:45:21.040669 12 server_family.cc:876] Load finished, num keys read: 4802585
Environment (please complete the following information):
- OS: Debian 11 64bit
- Kernel: 5.10.0-28-amd64 # SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux
- Containerized:
- Docker version 25.0.3, build 4debf41
- Docker Compose version v2.24.5
- docker-compose.yml:
version: '3.8'
services:
dragonfly:
restart: always
image: 'ghcr.io/dragonflydb/dragonfly-weekly:1e06c63727ffe84d2b4f9c36c6d5122082f52ae0-ubuntu'
ulimits:
memlock: -1
ports:
- "6379:6379"
command: [
#'--default_lua_flags=allow-undeclared-keys disable-atomicity',
'--default_lua_flags=allow-undeclared-keys',
'--snapshot_cron=*/1 * * * *',
'--proactor_threads=10',
'--lua_auto_async=true',
'--cluster_mode=emulated',
'--lock_on_hashtags=true'
]
volumes:
- dragonflydata:/data
logging:
driver: 'json-file'
options:
max-file: '2'
max-size: 150m
volumes:
dragonflydata
- Dragonfly Version: dragonfly-weekly:1e06c63727ffe84d2b4f9c36c6d5122082f52ae0-ubuntu
Additional context
Server
redis_version:6.2.11 dragonfly_version:df-1e06c63 redis_mode:cluster arch_bits:64 os:Linux 5.10.0-28-amd64 x86_64 multiplexing_api:epoll tcp_port:6379 thread_count:10 uptime_in_seconds:22952 uptime_in_days:0
Clients
connected_clients:17126 client_read_buffer_bytes:64649472 blocked_clients:2379 dispatch_queue_entries:715
Memory
used_memory:5069151624 used_memory_human:4.72GiB used_memory_peak:6047369968 used_memory_peak_human:5.63GiB used_memory_rss:8625655808 used_memory_rss_human:8.03GiB used_memory_peak_rss:9128116224 comitted_memory:8665546768 maxmemory:71206184550 maxmemory_human:66.32GiB object_used_memory:4652220536 type_used_memory_STRING:130084768 type_used_memory_LIST:112017112 type_used_memory_SET:128 type_used_memory_ZSET:8919504 type_used_memory_HASH:4401199024 table_used_memory:461137616 num_buckets:856140 num_entries:4069884 inline_keys:1 listpack_blobs:18446744073709551541 listpack_bytes:18446744073709551245 small_string_bytes:177402880 pipeline_cache_bytes:585527830 dispatch_queue_bytes:155854306 dispatch_queue_subscriber_bytes:0 dispatch_queue_peak_bytes:994844136 client_read_buffer_peak_bytes:71593216 cache_mode:store maxmemory_policy:noeviction
Stats
total_connections_received:49584 total_commands_processed:399834239 instantaneous_ops_per_sec:20779 total_pipelined_commands:19666025 pipelined_latency_usec:23117808172290 total_net_input_bytes:162161664744 connection_migrations:0 total_net_output_bytes:44692164412 instantaneous_input_kbps:-1 instantaneous_output_kbps:-1 rejected_connections:-1 expired_keys:26588 evicted_keys:0 hard_evictions:0 garbage_checked:265 garbage_collected:1 bump_ups:0 stash_unloaded:19590 oom_rejections:0 traverse_ttl_sec:0 delete_ttl_sec:0 keyspace_hits:46561511 keyspace_misses:92144233 keyspace_mutations:265666683 total_reads_processed:107250651 total_writes_processed:211931196 defrag_attempt_total:0 defrag_realloc_total:0 defrag_task_invocation_total:0 reply_count:211931196 reply_latency_usec:350785827 blocked_on_interpreter:751 ram_hits:0 ram_misses:0
Replication
role:master connected_slaves:0 master_replid:246b1cc8d00249bffee4a9a798fa9b71fc5fd938
Modules
module:name=ReJSON,ver=20000,api=1,filters=0,usedby=[search],using=[],options=[handle-io-errors] module:name=search,ver=20000,api=1,filters=0,usedby=[],using=[ReJSON],options=[handle-io-errors]
Keyspace
db0:keys=4069884,expires=2549,avg_ttl=-1
Cpu
used_cpu_sys:37006.527869 used_cpu_user:40330.558211 used_cpu_sys_children:0.0 used_cpu_user_children:0.4102 used_cpu_sys_main_thread:3116.513383 used_cpu_user_main_thread:3675.871304
Cluster
cluster_enabled:1
@dg-korolev that's disappointing. I am looking into it.
@dg-korolev do you run in cloud or on-prem?
@romange It's running on-premise, on one's own hardware
I will build another custom binary. it won't solve the issue but will provide more insights if this happens again.
@dg-korolev please use this image docker pull ghcr.io/dragonflydb/dragonfly-weekly:fbb0a9e2f138913e98cf117cbbbbcd1aeb9b9afb-ubuntu
Thanks, the image is running. I continue to monitor this 👍
from v1.15.0:
E20240311 10:35:58.513952 1805 protocol_client.cc:297] Could not shutdown socket system:107
E20240311 10:35:58.531046 1805 replica.cc:191] Error connecting to 192.168.112.60:9999 generic:125
E20240311 10:41:33.562984 1806 scheduler.cc:323] Failed to pull active fiber from remote_ready_queue, iteration 0 remote_empty: 0, next:281472604971776
E20240311 10:41:33.563086 1806 scheduler.cc:327] Next fiber next is: 0xffff70060b00, usecount2
F20240311 10:41:33.563102 1806 fiber_interface.cc:303] Failed to pull DflyConnection from remote_ready_queue. res=0 remotenext: 281472604971776 remotempty: 0
*** Check failure stack trace: ***
@ 0xaaaad7218144 google::LogMessage::SendToLog()
@ 0xaaaad7211228 google::LogMessage::Flush()
@ 0xaaaad7212b64 google::LogMessageFatal::~LogMessageFatal()
@ 0xaaaad704922c util::fb2::detail::FiberInterface::PullMyselfFromRemoteReadyQueue()
Critical issues were fixed in 1.16.0