dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Deadlock watchdog during `TieredStorageTest::MemoryPressure` indirectly causes assert error

Open abhijat opened this issue 2 months ago • 3 comments

https://github.com/dragonflydb/dragonfly/actions/runs/19295614380/job/55176726865?pr=6048

During the test:

2025-11-12T11:41:49.8323969Z [ RUN      ] TieredStorageTest.MemoryPressure
2025-11-12T11:41:49.8333008Z I20251112 11:41:49.833070 14033 proactor_pool.cc:149] Running 1 io threads
2025-11-12T11:41:49.8495820Z I20251112 11:41:49.849279 14033 engine_shard_set.cc:66] max_file_size has not been specified. Deciding myself....
2025-11-12T11:41:49.8496474Z I20251112 11:41:49.849320 14033 engine_shard_set.cc:83] Max file size is: 57.29GiB
2025-11-12T11:41:49.8528207Z I20251112 11:41:49.852581 14033 test_utils.cc:266] Starting MemoryPressure
2025-11-12T11:42:09.8282563Z [/__w/dragonfly/dragonfly/build/_deps/abseil_cpp-src/absl/flags/internal/flag.cc : 147] RAW: Restore saved value of tiered_upload_threshold to: 0.1
2025-11-12T11:42:09.8574429Z E20251112 11:42:09.857035 14052 test_utils.cc:272] Deadlock detected!!!!

While printing the transaction locks for shard 0 an assert failure occurs when getting the default namespace:

2025-11-12T11:42:09.9700680Z 0x557159c84f30  std::__invoke_impl<>()
2025-11-12T11:42:09.9701236Z E20251112 11:42:09.968959 14052 test_utils.cc:296] TxLocks for shard 0
2025-11-12T11:42:09.9702484Z ==14033==WARNING: ASan is ignoring requested __asan_handle_no_return: stack type: default top: 0x7f203dfa7140; bottom 0x5310044a6000; size: 0x2c1039b01140 (48448198938944)
2025-11-12T11:42:09.9703621Z False positive error reports may follow
2025-11-12T11:42:09.9704223Z For details see https://github.com/google/sanitizers/issues/189
2025-11-12T11:42:09.9705042Z F20251112 11:42:09.969051 14052 namespaces.cc:80] Check failed: default_namespace_ != nullptr

abhijat avatar Nov 13 '25 06:11 abhijat

Can we close it @abhijat ?

romange avatar Nov 24 '25 14:11 romange

Can we close it @abhijat ?

It has not happened again as far as I can see, but I think we still need to look into why it happened.

abhijat avatar Nov 25 '25 01:11 abhijat

happened again https://github.com/dragonflydb/dragonfly/actions/runs/19730727878/job/56530982105

abhijat avatar Nov 27 '25 09:11 abhijat

unfortunately the logs artifact link is gone.

abhijat avatar Dec 15 '25 06:12 abhijat

The test is renamed to ThrottleClients

abhijat avatar Dec 15 '25 06:12 abhijat

I introduced deadlock artificially at different points in test but cannot reproduce the crash. Closing as it has not recurred in several weeks, will reopen if it happens again

abhijat avatar Dec 15 '25 06:12 abhijat