PANDA 1.0 record cannot handle a record file that is more than 2GB
I am experiencing the following problem with PANDA recording. We use PANDA 1.0.
I record back-to-back record files that each last 2 minutes. Each record file has a certain amount of size in bits. It seems that when the record size is more than 2GB, there is a casting overflow problem. And the linux (host) cannot handle it, PANDA record crashes, and of course the guest stops. Specifically, I think that the guest or the host "translates" the record (filesize) 2GB into some thousands of terrabytes (due to the potential casting error), and I get the error: Glib-ERROR **: build/buildd/gliz2.40.2/./glib/gmem.c:103: failed to allocate 18446744071595337090 bytes.
Overall, it seems that PANDA cannot handle more than 2GB record filesize (more precisely, PANDA cannot handle a workload (in the guest) that corresponds to a record size higher than 2GB ). Has anyone got this issue before?
It's really annoying to not be able to record a heavy workload because the record filesize might exceed 2GB and PANDA crash.
Could you provide a backtrace for this? (e.g. by running under gdb and using bt). That will help narrow down where the int that is too small is.
Hi Moyix,
Thanks a lot for your quick reply. Here is the backtrace of the qemu process when the crash happened:
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x00007f2c5e4bac13 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
(gdb) bt
#0 0x00007f2c5e4bac13 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1 0x00007f2c5e4bad72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007f2c5e4b9644 in g_malloc () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x0000000000609d46 in qemu_sendfile (offset=0, len=-1876860136, src=0x30c8880, dst=0x31e97c0) at savevm.c:750
#4 qemu_concat_section (dst=dst@entry=0x31e97c0, src=src@entry=0x30c8880) at savevm.c:1656
#5 0x000000000060b60b in qemu_savevm_state_begin (mon=mon@entry=0x1de4a20, f=f@entry=0x31e97c0, blk_enable=blk_enable@entry=0, shared=shared@entry=0) at savevm.c:1707
#6 0x000000000060b807 in qemu_savevm_state (mon=mon@entry=0x1de4a20, f=f@entry=0x31e97c0) at savevm.c:1846
#7 0x000000000060c16d in do_savevm_rr (mon=0x1de4a20, name=name@entry=0x7ffeefa4b350 "/home/hari/ReplayServer/records/12681149-rr-snp") at savevm.c:2283
#8 0x00000000006e3fa3 in rr_do_begin_record (file_name_full=
Please do let me know if there is anything more that I can provide you with. We are looking forward for your reply.
Hmm, it looks like the culprit may be this bit of QEMU code:
https://github.com/moyix/panda/blob/f758bee11ade49c904675ec2bc67cae29ed5b121/qemu/savevm.c#L731-L737
I'll have to think about how to fix this. Possibly we could detect size > 2GB and split up the section into smaller chunks...