hse icon indicating copy to clipboard operation
hse copied to clipboard

Crash-consistency bug in lib/mpool/src/mdc_file.c & omf.c

Open IKACE opened this issue 2 years ago • 0 comments

Description

Hello HSE developers and maintainers, we found this crash consistency bug while testing HSE using an experimental PM crash testing tool.

During loghdr_update (mdc_file.c:444), it calls omf_mdc_loghdr_pack(omf.c:38). If a crash happens after omf_set_lh_gen(lhomf, lh->gen); (omf.c:50) but before omf_set_lh_crc(lhomf, crc); (omf.c:55) this may result in only fields lh_vers, lh_magic andlh_gen of lhomf being written, but not field lh_crc.

Later when we re-open the database, calling mdc_file_erase in mpool_mdc_open (mdc.c:224) will get a nullptr mfp, thus the application exits with an error "mdc file1 logid xxxxx erase failed, gen (xxx, xxx): lib/mpool/src/mdc_file.c:442" (mdc.c:226).

Expected Behavior

Erase the passive log successfully via mdc_file_erase in mpool_mdc_open (mdc.c:224) when such a crash is detected.

Steps to reproduce

  1. Use GDB to load the following code snippet, assuming we have an HSE database created.

    kvdb_home = "/.../your_hse_database_path"
    hse_kvdb *kvdb;
    hse_kvdb_open(kvdb_home, 0, nullptr, &kvdb);
    
  2. Put a breakpoint after omf_set_lh_gen(lhomf, lh->gen); (omf.c:50) but before omf_set_lh_crc(lhomf, crc); (omf.c:55)

  3. Run the program in GDB, until it reaches the breakpoint.

  4. Check if the stacktrace looks similar to this, the line number may be slightly different, but function calls should be identical.

    #0  omf_mdc_loghdr_pack (lh=0x4b73d8, outbuf=0x7fff57a00000 "\002")
        at ../../../../targets/hse-project/hse/lib/mpool/src/omf.c:52
    #1  0x00007ffff7e55c25 in loghdr_update (mfp=0x4b73d0, lh=0x4b73d8, gen=4)
        at ../../../../targets/hse-project/hse/lib/mpool/src/mdc_file.c:101
    #2  0x00007ffff7e55986 in mdc_file_erase (mfp=0x4b73d0, newgen=4)
        at ../../../../targets/hse-project/hse/lib/mpool/src/mdc_file.c:444
    #3  0x00007ffff7e54492 in mpool_mdc_cend (mdc=0x4712b0)
        at ../../../../targets/hse-project/hse/lib/mpool/src/mdc.c:335
    #4  0x00007ffff7da1020 in cndb_rollover (cndb=0x4afce0)
        at ../../../../targets/hse-project/hse/lib/cn/cndb.c:2691
    #5  0x00007ffff7d9f489 in cndb_replay (cndb=0x4afce0, 
        seqno=0x7fffffffca28, ingestid=0x7fffffffca00, 
        txhorizon=0x7fffffffc9f0)
        at ../../../../targets/hse-project/hse/lib/cn/cndb.c:2142
    #6  0x00007ffff7e062ff in ikvdb_cndb_open (self=0x477000, 
        seqno=0x7fffffffca28, ingestid=0x7fffffffca00, 
        txhorizon=0x7fffffffc9f0)
        at ../../../../targets/hse-project/hse/lib/kvdb/ikvdb.c:1299
    #7  0x00007ffff7e033b6 in ikvdb_open (
        kvdb_home=0x7fffffffe7ac "/mnt/pmem/hse_gdb", params=0x7fffffffdc18, 
        handle=0x7fffffffe068)
        at ../../../../targets/hse-project/hse/lib/kvdb/ikvdb.c:1587
    #8  0x00007ffff7d61398 in hse_kvdb_open (
        kvdb_home=0x7fffffffe7ac "/mnt/pmem/hse_gdb", paramc=1, 
        paramv=0x7fffffffe2c8, handle=0x7fffffffe2d8)
        at ../../../../targets/hse-project/hse/lib/binding/kvdb_interface.c:388
    
  5. Exit the program at the breakpoint.

  6. Try to re-open the hse database by running the code snippet again.

Linux Distribution

Ubuntu 22.04.1 LTS

File System

ext4

Other System details

Persistent Memory Type: Intel® Optane™ Persistent Memory 100 Series (256GB Module)

IKACE avatar Oct 09 '22 02:10 IKACE