speedb icon indicating copy to clipboard operation
speedb copied to clipboard

Running with multiple groups resulted in hang.

Open erez-speedb opened this issue 2 years ago • 6 comments

Running this simple benchmark ./db_bench -groups "-memtablerep=skip_list --benchmarks=fillrandom,levelstats,flush,overwrite --compression_type=None -db=/data/ -num=500000 -value_size=1000 -key_size=16 --delayed_write_rate=536870912 -report_interval_seconds=1 -max_write_buffer_number=4 -num_column_families=1 -histogram -report_file=fillrandom.csv --disable_wal=true -db_write_buffer_size=110000000 -max_background_compactions=8 -max_background_flushes=4 -bloom_bits=10" " --benchmarks=overwrite"

The same test with single group passed with no issue ./db_bench -groups "-memtablerep=skip_list --benchmarks=fillrandom,levelstats,flush,overwrite,overwrite --compression_type=None -db=/data/ -num=500000 -value_size=1000 -key_size=16 --delayed_write_rate=536870912 -report_interval_seconds=1 -max_write_buffer_number=4 -num_column_families=1 -histogram -report_file=fillrandom.csv --disable_wal=true -db_write_buffer_size=110000000 -max_background_compactions=8 -max_background_flushes=4 -bloom_bits=10"

erez-speedb avatar Dec 05 '22 10:12 erez-speedb

The hang is in the second group run:

overwrite: 2.388 micros/op 418794 ops/sec; 405.8 MB/s Microseconds per write: Count: 500000 Average: 2.3878 StdDev: 5.59 Min: 0 Median: 1.5442 Max: 2080 Percentiles: P50: 1.54 P75: 2.33 P99: 8.47 P99.9: 29.81 P99.99: 108.45

[ 0, 1 ] 135973 27.195% 27.195% ( 1, 2 ] 209528 41.906% 69.100% ( 2, 3 ] 88931 17.786% 86.886% ( 3, 4 ] 35271 7.054% 93.941% ( 4, 6 ] 21708 4.342% 98.282% ( 6, 10 ] 5808 1.162% 99.444% ( 10, 15 ] 1555 0.311% 99.755% ( 15, 22 ] 515 0.103% 99.858% ( 22, 34 ] 324 0.065% 99.923% ( 34, 51 ] 165 0.033% 99.956% ( 51, 76 ] 130 0.026% 99.982% ( 76, 110 ] 44 0.009% 99.990% ( 110, 170 ] 20 0.004% 99.994% ( 170, 250 ] 10 0.002% 99.996% ( 250, 380 ] 8 0.002% 99.998% ( 380, 580 ] 4 0.001% 99.999% ( 580, 870 ] 2 0.000% 99.999% ( 870, 1300 ] 2 0.000% 100.000% ( 1900, 2900 ] 2 0.000% 100.000%

Group 2/2

Using exiting options Keys: 16 bytes each (+ 0 bytes user-defined timestamp) Values: 1000 bytes each (500 bytes after compression) Entries: 500000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 484.5 MB (estimated) FileSize: 246.0 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: NoCompression Compression sampling rate: 0 Memtablerep: skip_list Perf Level: 1

... finished 200000 ops

erez-speedb avatar Dec 05 '22 10:12 erez-speedb

The log shows the stall is waiting for memtable flush 2022/12/05-10:20:51.390331 76427 EVENT_LOG_v1 {"time_micros": 1670235651390328, "job": 21, "event": "table_file_deletion", "file_number": 22} 2022/12/05-10:20:51.480097 76475 [/db_impl/db_impl_write.cc:1544] Flushing column family with oldest memtable entry. Write buffers are using 111154000 bytes out of a total of 110000000. 2022/12/05-10:20:51.480110 76475 [/db_impl/db_impl_write.cc:1958] [default] New memtable created with log file: #5. Immutable memtables: 1. 2022/12/05-10:20:51.572380 76475 [/db_impl/db_impl_write.cc:1544] Flushing column family with oldest memtable entry. Write buffers are using 166731000 bytes out of a total of 110000000. 2022/12/05-10:20:51.572392 76475 [/db_impl/db_impl_write.cc:1958] [default] New memtable created with log file: #5. Immutable memtables: 2. 2022/12/05-10:20:51.572397 76475 [WARN] [/column_family.cc:933] [default] Stalling writes because we have 3 immutable memtables (waiting for flush), max_write_buffer_number is set to 4 rate 536870912 2022/12/05-10:20:51.676981 76475 [/db_impl/db_impl_write.cc:1544] Flushing column family with oldest memtable entry. Write buffers are using 222308000 bytes out of a total of 110000000. 2022/12/05-10:20:51.676993 76475 [/db_impl/db_impl_write.cc:1958] [default] New memtable created with log file: #5. Immutable memtables: 3. 2022/12/05-10:20:51.676999 76475 [WARN] [/column_family.cc:899] [default] Stopping writes because we have 4 immutable memtables (waiting for flush), max_write_buffer_number is set to 4

erez-speedb avatar Dec 05 '22 10:12 erez-speedb

@ayulas - is this another occurrence of the problem here #637 ?

Guyme avatar Oct 30 '23 09:10 Guyme

@Guyme yes

ayulas avatar Oct 30 '23 10:10 ayulas

no a duplicate of #637

Guyme avatar Oct 30 '23 10:10 Guyme

still relevant on branch 637-stall-deadlock

erez-speedb avatar Oct 30 '23 13:10 erez-speedb