Merging PMDebugger into Pmemcheck
@krzycz this is context of https://dl.acm.org/doi/abs/10.1145/3445814.3446744?sid=SCITRUS @dibang2008 has agreed to upstream those changes back. But yes, I agree with your points. But instead of squashing the commits, I would recommend simply improving the commit message to accurately reflect the why and how of the change.
I have done following things according to your comments:
- format my codes according to existing codes;
- add a document (named PMDebuggerToPmemcheck.md) to simply illustrate what has changed and why;
- move is_delete to the end of the structure;
- change 100000000UL to 1000UL to save memory.
For unit tests, I have used the existing tests in Pmemcheck to verify my code (perl tests/vg_regtest pmemcheck) and I am not sure that is suitable for you. If you want more comprehensive unit tests, please give me more details (I am not familiar with this area).
Thanks
OK, I will try to solve these problems next week, because I have to prepare my dissertation oral defense this week
No worries, take your time. Good luck with your dissertation!
Thanks for cleaning up this code. Looks much nicer now, but there are still some stylistic changes that make it a bit hard to read. I've pointed out a few in my comments.
I'm still a bit worried that this change is going to make pmemcheck a bit harder to maintain - so I'll try to think of a way to mitigate that.
I have fixed these issues. Thanks for your valuable suggestions!
Yes, your understanding is correct and we can make our structures more obvious. But there is one problem: a dynamic structure (such as Oset in Valgrind) may introduce extra overheads because a dynamic structure requires reallocations, deletions, and rebalancing. These overheads can be very large as the number of instructions increase.
We can try a dynamic structure but we need to find a more lightweight structure supported by Valgrind (The Oset structure is not a good choice). Any suggestion? Thanks.
Well, you are already using a reallocating array - so my suggestion would be to wrap it in a small abstraction called "Ovec" or something like that.
EDIT: This already exists in valgrind and is called XArray.
Btw, we've run regression testing over PMDK using valgrind from this patch and many of our tests have failed with this error:
[line 55881] obj_basic_integration/TEST1 pmemcheck1.log valgrind: m_mallocfree.c:305 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
[line 55882] obj_basic_integration/TEST1 pmemcheck1.log valgrind: Heap block lo/hi size mismatch: lo = 48032, hi = 8.
[line 55883] obj_basic_integration/TEST1 pmemcheck1.log This is probably caused by your program erroneously writing past the
[line 55884] obj_basic_integration/TEST1 pmemcheck1.log end of a heap block and corrupting heap metadata. If you fix any
[line 55885] obj_basic_integration/TEST1 pmemcheck1.log invalid writes reported by Memcheck, this assertion failure will
[line 55886] obj_basic_integration/TEST1 pmemcheck1.log probably go away. Please try that before reporting this as a bug.
[line 55887] obj_basic_integration/TEST1 pmemcheck1.log
[line 55888] obj_basic_integration/TEST1 pmemcheck1.log
[line 55889] obj_basic_integration/TEST1 pmemcheck1.log host stacktrace:
[line 55890] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== at 0x58008F28: show_sched_status_wrk (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55891] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x58009037: report_and_quit (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55892] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x580091BE: vgPlain_assert_fail (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55893] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x5800FA01: blockSane (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55894] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x580136A6: vgPlain_arena_realloc (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55895] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x580018C3: pInfo_AllocNodeInArray (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55896] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x58002954: trace_pmem_store (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55897] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x1002A1D107: ???
[line 55898] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0x100288DF2F: ???
[line 55899] obj_basic_integration/TEST1 pmemcheck1.log ==1747976== by 0xB7DA: ???
Can you reproduce this issue on your end?
Can you send me the regression testing? It can help me reproduce this bug. I am not sure if the regression testing can be open-source. If not, can you give some hints of input data? I will try to reproduce this bug. Thanks :).
We've simply ran the entire PMDK test suite under this version of pmemcheck. The specific output example was from obj_basic_integration TEST1. You can run it by going into the directory of that test (src/test/obj_basic_integration) and then ./TESTS.py -u 1.
I also have a bug for this test but it is not the same as yours. It is as follow:
obj_basic_integration/TEST1: SETUP (medium/pmem/debug/pmemcheck)
Error 1
Last 0 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/err1.log below (whole file has 0 lines):
Last 2 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/out1.log below (whole file has 2 lines):
obj_basic_integration/TEST1: START: obj_basic_integration
/home/aim/bangdi/pmdk/src/test/obj_basic_integration/obj_basic_integration /mnt/dbpmemfs/obj_basic_integration_1/testfile1
Last 2 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/trace1.log below (whole file has 2 lines):
{obj_basic_integration.c:654 main} obj_basic_integration/TEST1: START: obj_basic_integration
/home/aim/bangdi/pmdk/src/test/obj_basic_integration/obj_basic_integration /mnt/dbpmemfs/obj_basic_integration_1/testfile1
Last 30 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/pmemcheck1.log below (whole file has 54 lines):
==6698== by 0x1002A8DF17: ???
==6698== by 0x1002A8DF2F: ???
==6698== by 0x1002A8DF3F: ???
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable (lwpid 6698)
==6698== at 0x4C81A94: ulog_construct (ulog.c:178)
==6698== by 0x4C5DA64: lane_init_data (lane.c:345)
==6698== by 0x4C6A7CE: obj_descr_create (obj.c:905)
==6698== by 0x4C6C3BD: pmemobj_createU (obj.c:1411)
==6698== by 0x4C6C57F: pmemobj_create (obj.c:1452)
==6698== by 0x10FC27: main (obj_basic_integration.c:666)
client stack range: [0x1FFEFFB000 0x1FFF000FFF] client SP: 0x1FFEFFFA30
valgrind stack range: [0x100298E000 0x1002A8DFFF] top usage: 7088 of 1048576
Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what OS and version you are using. Thanks.
This bug has been fixed by aligning struct arr_md in pmc_include.h.
You can try my new commit. If bugs still exist, I think your testconfig.py may help us reproduce your bugs.