hse
hse copied to clipboard
Crash-consistency bug in lib/mpool/src/mdc_file.c & omf.c
Description
Hello HSE developers and maintainers, we found this crash consistency bug while testing HSE using an experimental PM crash testing tool.
During loghdr_update
(mdc_file.c:444
), it calls omf_mdc_loghdr_pack
(omf.c:38
). If a crash happens after omf_set_lh_gen(lhomf, lh->gen);
(omf.c:50
) but before omf_set_lh_crc(lhomf, crc);
(omf.c:55
) this may result in only fields lh_vers
, lh_magic
andlh_gen
of lhomf
being written, but not field lh_crc
.
Later when we re-open the database, calling mdc_file_erase
in mpool_mdc_open
(mdc.c:224
) will get a nullptr mfp
, thus the application exits with an error "mdc file1 logid xxxxx erase failed, gen (xxx, xxx): lib/mpool/src/mdc_file.c:442" (mdc.c:226
).
Expected Behavior
Erase the passive log successfully via mdc_file_erase
in mpool_mdc_open
(mdc.c:224
) when such a crash is detected.
Steps to reproduce
-
Use GDB to load the following code snippet, assuming we have an HSE database created.
kvdb_home = "/.../your_hse_database_path" hse_kvdb *kvdb; hse_kvdb_open(kvdb_home, 0, nullptr, &kvdb);
-
Put a breakpoint after
omf_set_lh_gen(lhomf, lh->gen);
(omf.c:50
) but beforeomf_set_lh_crc(lhomf, crc);
(omf.c:55
) -
Run the program in GDB, until it reaches the breakpoint.
-
Check if the stacktrace looks similar to this, the line number may be slightly different, but function calls should be identical.
#0 omf_mdc_loghdr_pack (lh=0x4b73d8, outbuf=0x7fff57a00000 "\002") at ../../../../targets/hse-project/hse/lib/mpool/src/omf.c:52 #1 0x00007ffff7e55c25 in loghdr_update (mfp=0x4b73d0, lh=0x4b73d8, gen=4) at ../../../../targets/hse-project/hse/lib/mpool/src/mdc_file.c:101 #2 0x00007ffff7e55986 in mdc_file_erase (mfp=0x4b73d0, newgen=4) at ../../../../targets/hse-project/hse/lib/mpool/src/mdc_file.c:444 #3 0x00007ffff7e54492 in mpool_mdc_cend (mdc=0x4712b0) at ../../../../targets/hse-project/hse/lib/mpool/src/mdc.c:335 #4 0x00007ffff7da1020 in cndb_rollover (cndb=0x4afce0) at ../../../../targets/hse-project/hse/lib/cn/cndb.c:2691 #5 0x00007ffff7d9f489 in cndb_replay (cndb=0x4afce0, seqno=0x7fffffffca28, ingestid=0x7fffffffca00, txhorizon=0x7fffffffc9f0) at ../../../../targets/hse-project/hse/lib/cn/cndb.c:2142 #6 0x00007ffff7e062ff in ikvdb_cndb_open (self=0x477000, seqno=0x7fffffffca28, ingestid=0x7fffffffca00, txhorizon=0x7fffffffc9f0) at ../../../../targets/hse-project/hse/lib/kvdb/ikvdb.c:1299 #7 0x00007ffff7e033b6 in ikvdb_open ( kvdb_home=0x7fffffffe7ac "/mnt/pmem/hse_gdb", params=0x7fffffffdc18, handle=0x7fffffffe068) at ../../../../targets/hse-project/hse/lib/kvdb/ikvdb.c:1587 #8 0x00007ffff7d61398 in hse_kvdb_open ( kvdb_home=0x7fffffffe7ac "/mnt/pmem/hse_gdb", paramc=1, paramv=0x7fffffffe2c8, handle=0x7fffffffe2d8) at ../../../../targets/hse-project/hse/lib/binding/kvdb_interface.c:388
-
Exit the program at the breakpoint.
-
Try to re-open the hse database by running the code snippet again.
Linux Distribution
Ubuntu 22.04.1 LTS
File System
ext4
Other System details
Persistent Memory Type: Intel® Optane™ Persistent Memory 100 Series (256GB Module)