horaedb Unit test may fail with "pure virtual method called"

Describe this problem

UT/CI sometimes is failed with "pure virtual method called, terminate called without an active exception"

Two cases failed in my env are

        FAIL [   0.353s] analytic_engine tests::drop_test::test_drop_table_once_rocks

--- STDOUT:              analytic_engine tests::drop_test::test_drop_table_once_rocks ---

running 1 test
test tests::drop_test::test_drop_table_once_rocks ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 77 filtered out; finished in 0.08s


--- STDERR:              analytic_engine tests::drop_test::test_drop_table_once_rocks ---
pure virtual method called
terminate called without an active exception

        FAIL [   0.314s] analytic_engine tests::open_test::test_open_engine_rocks

--- STDOUT:              analytic_engine tests::open_test::test_open_engine_rocks ---

running 1 test
test tests::open_test::test_open_engine_rocks ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 77 filtered out; finished in 0.04s


--- STDERR:              analytic_engine tests::open_test::test_open_engine_rocks ---
pure virtual method called
terminate called without an active exception

They are both introduced/modified in #62

Steps to reproduce

Run unit tests

cargo test --workspace

But this won't occur every time (in my local env).

Expected behavior

tests can pass

Additional Information

https://github.com/nervosnetwork/ckb/issues/2927 looks like the same problem. ref #154

Jul 05 '22 07:07 waynexia

The panic does not happen on my development environment (Linux) ever once by running the tests concerning RocksDB by rr. Maybe we should support to reproduce it and capture more information on environment provided by GitHub.

Sep 05 '22 02:09 ShiKaiWi

This bug looks like a concurrent bug. rr will limit its tracee to use at most one CPU, here is the output I got from rr record --help

  -u, --cpu-unbound          allow tracees to run on any virtual CPU.
                             Default is to bind to a random CPU.  This option
                             can cause replay divergence: use with
                             caution.
  --bind-to-cpu=<NUM>        Bind to a particular CPU

So I guess this is why you cannot reproduce it with rr. I'll try -u option later

Sep 05 '22 07:09 waynexia

Well... things become complicated 😵

TL;DR: I'm giving up rr and going to use gdb instead. Detailed reason:

rr record -u gives this error:

> rr record -u /home/ruihang/repo/CeresDB/target/debug/deps/analytic_engine-05df844a1b1791ff
rr: Saving execution to trace directory `/home/ruihang/.local/share/rr/analytic_engine-05df844a1b1791ff-22'.
[FATAL src/record_syscall.cc:4218:rec_prepare_syscall_arch()] 
 (task 426239 (rec:426239) at time 250)
 -> Assertion `t->session().trace_writer().bound_to_cpu() >= 0' failed to hold. rseq not supported with unbound tasks

It comes from here https://github.com/rr-debugger/rr/blob/452f652321f87722da64ca363c3deea568ea0b67/src/record_syscall.cc#L4220-L4221

      // We can only support rseq when the tracee is bound to a specific CPU. otherwise cpu_id_start
      // and cpu_id fields would need to be managed by rr and would not match reality.

And the rseq mode is newly added in glibc 2.35 (source), which is my env's version

 /lib/libc.so.6 
GNU C Library (GNU libc) stable release version 2.35.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 12.1.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
<https://bugs.archlinux.org/>.

Sep 05 '22 08:09 waynexia

It seems a long time that this doesn't happen. Let's close it now. And reopen it if we encounter it again.

Nov 18 '22 08:11 ShiKaiWi