panda icon indicating copy to clipboard operation
panda copied to clipboard

PANDA 1.0 record cannot handle a record file that is more than 2GB

Open nsapountzis opened this issue 7 years ago • 3 comments

I am experiencing the following problem with PANDA recording. We use PANDA 1.0.

I record back-to-back record files that each last 2 minutes. Each record file has a certain amount of size in bits. It seems that when the record size is more than 2GB, there is a casting overflow problem. And the linux (host) cannot handle it, PANDA record crashes, and of course the guest stops. Specifically, I think that the guest or the host "translates" the record (filesize) 2GB into some thousands of terrabytes (due to the potential casting error), and I get the error: Glib-ERROR **: build/buildd/gliz2.40.2/./glib/gmem.c:103: failed to allocate 18446744071595337090 bytes.

Overall, it seems that PANDA cannot handle more than 2GB record filesize (more precisely, PANDA cannot handle a workload (in the guest) that corresponds to a record size higher than 2GB ). Has anyone got this issue before?

It's really annoying to not be able to record a heavy workload because the record filesize might exceed 2GB and PANDA crash.

nsapountzis avatar Mar 21 '18 23:03 nsapountzis

Could you provide a backtrace for this? (e.g. by running under gdb and using bt). That will help narrow down where the int that is too small is.

moyix avatar Mar 22 '18 15:03 moyix

Hi Moyix,

Thanks a lot for your quick reply. Here is the backtrace of the qemu process when the crash happened:

Program terminated with signal SIGTRAP, Trace/breakpoint trap. #0 0x00007f2c5e4bac13 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 (gdb) bt #0 0x00007f2c5e4bac13 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #1 0x00007f2c5e4bad72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 0x00007f2c5e4b9644 in g_malloc () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #3 0x0000000000609d46 in qemu_sendfile (offset=0, len=-1876860136, src=0x30c8880, dst=0x31e97c0) at savevm.c:750 #4 qemu_concat_section (dst=dst@entry=0x31e97c0, src=src@entry=0x30c8880) at savevm.c:1656 #5 0x000000000060b60b in qemu_savevm_state_begin (mon=mon@entry=0x1de4a20, f=f@entry=0x31e97c0, blk_enable=blk_enable@entry=0, shared=shared@entry=0) at savevm.c:1707 #6 0x000000000060b807 in qemu_savevm_state (mon=mon@entry=0x1de4a20, f=f@entry=0x31e97c0) at savevm.c:1846 #7 0x000000000060c16d in do_savevm_rr (mon=0x1de4a20, name=name@entry=0x7ffeefa4b350 "/home/hari/ReplayServer/records/12681149-rr-snp") at savevm.c:2283 #8 0x00000000006e3fa3 in rr_do_begin_record (file_name_full=, cpu_state=0x1de7e60) at /home/hari/temp/faros/faros/panda/qemu/rr_log.c:1492 #9 0x0000000000536fb8 in main_loop () at /home/hari/temp/faros/faros/panda/qemu/vl.c:1563 #10 main (argc=, argv=, envp=) at /home/hari/temp/faros/faros/panda/qemu/vl.c:3827

Please do let me know if there is anything more that I can provide you with. We are looking forward for your reply.

nsapountzis avatar Mar 22 '18 17:03 nsapountzis

Hmm, it looks like the culprit may be this bit of QEMU code:

https://github.com/moyix/panda/blob/f758bee11ade49c904675ec2bc67cae29ed5b121/qemu/savevm.c#L731-L737

I'll have to think about how to fix this. Possibly we could detect size > 2GB and split up the section into smaller chunks...

moyix avatar Mar 24 '18 00:03 moyix