rr icon indicating copy to clipboard operation
rr copied to clipboard

rr record seems to loop infinitely in 32-bit x86 with certain system calls

Open kdnilsen opened this issue 1 year ago • 7 comments

I don't have a full diagnosis of what is going wrong here. The symptoms I've observed are that attempts to run rr-record on an x86 32-bit build of OpenJDK tip do not return from a call to the mmap system call. top reports that rr continue to run, but trace statements placed before and after the mmap call confirm that control does not return from mmap.

When I run the same 32-bit build outside rr, trace statements confirm that the mmap call is successful. I have also been able to step through (over) the call to mmap service with gdb.

This problem with mmap calls is not universal. I have found that during initialization of the JVM, the first 7 calls to mmap succeed. As far as I have been able to understand, all of the invocations are identical except for the size of the request:

With system call prototype: void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) The successful calls are parameterized, in order, as:

mmap(nullptr, 4096, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0); mmap(nullptr, 8192, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0); mmap(nullptr, 32768, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0); mmap(nullptr, 251_658_240, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0); mmap(nullptr, 45_056, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0); mmap(nullptr, 962_560, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0); mmap(nullptr, 962_560, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0);

The eight call, which has the largest requested size so far, is the one that fails to return (when running in rr):

mmap(nullptr, 1_073_741_824, PROT_NONE, MAP_PRIVATE | MAP_NORESERVE | MAP_ANONYMOUS, -1, 0);

Maybe this description is enough to prompt some thought about what might be going wrong. If not, I can provide more details regarding how to reproduce this problem and/or I can gather more details within my environment to help you debug futher.

Thanks.

kdnilsen avatar May 08 '23 00:05 kdnilsen

I should add that when I run top, it confirms that rr is still running and is consuming "100%" of one core's available CPU time.

kdnilsen avatar May 08 '23 00:05 kdnilsen

What kernel version are you using?

Have you determined whether the 8th mmap is for the same amount (0x40000000) both with and without rr? If it is, can you break into gdb on rr itself after it has run away and get a backtrace?

khuey avatar May 08 '23 07:05 khuey

Is this chaos mode? Infinite loop in AddressSpace::chaos_mode_find_free_memory or AddressSpace::find_free_memory is my guess.

rocallahan avatar May 08 '23 10:05 rocallahan

I'm running: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy

Command line: ~/lib/rr-5.6.0-Linux-x86_64/bin/rr record -h -n --disable-cpuid-features-ext 0xfc230000,0x2c42,0xc --num-cores=8
~/github/shenandoah.expand-old-on-demand/build/linux-x86-server-slowdebug/jdk/bin/java
-XX:ActiveProcessorCount=8 -XX:-UseNUMA
...

(not chaos mode)

My trace statements suggest the 8th mmap is the same size for both runs in and out of rr. (This argument represents the intended size of the heap. I have noticed from analyzing the non-rr run that the value returned from this 8th invocation is not aligned, so it is subsequently unmapped and then they map again with a slightly larger size so that they can truncate to get an aligned address for the heap. But as far as I can tell, we don't even get back from the 8th call when running under rr.)

I need to step away from my desk for a few hours. I'll try gdb on rr as soon as I'm back.

kdnilsen avatar May 08 '23 12:05 kdnilsen

Thanks for quick responses to my issue.

kdnilsen avatar May 08 '23 12:05 kdnilsen

would be good to get a dump of /proc/.../maps for the tracee while rr is looping

rocallahan avatar May 08 '23 20:05 rocallahan

Have you had a chance to try getting that stack trace or looking at /proc/.../maps yet?

khuey avatar May 22 '23 16:05 khuey