rr icon indicating copy to clipboard operation
rr copied to clipboard

Failed to locate librrpage_32.so

Open marc1uk opened this issue 1 year ago • 23 comments

When running CMake I receive the message:

"Your toolchain doesn't support 32-bit cross-compilation. Install the required packages or pass -Ddisable32bit=ON to cmake."

Since I am on a 64-bit system and am not intersted in running 32-bit applications, I pass the argument. But when I try to run rr, I then get:

[FATAL src/AddressSpace.cc:315:map_rr_page() errno: ENOENT] Failed to locate librrpage_32.so
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x28)[0x459ba8]
rr(_ZN2rr15notifying_abortEv+0xe)[0x45c76e]
rr[0x44ddc2]
rr(_ZN2rr12AddressSpace11map_rr_pageERNS_18AutoRemoteSyscallsE+0x697)[0x5a1f17]                                              
rr(_ZN2rr12AddressSpace17post_exec_syscallEPNS_4TaskE+0x75)[0x5a2105]                                                        
rr(_ZN2rr4Task17post_exec_syscallERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2d)[0x484b4d]                     
rr[0x5114ad]
rr[0x4e9837]
rr(_ZN2rr19rec_process_syscallEPNS_10RecordTaskE+0x92)[0x5146e2]                                                             
rr(_ZN2rr13RecordSession21syscall_state_changedEPNS_10RecordTaskEPNS0_9StepStateE+0x862)[0x524e32]                           
rr(_ZN2rr13RecordSession11record_stepEv+0x5a3)[0x51f393]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0xd6b)[0x522ceb]        
rr(main+0x157)[0x4447c7]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7fd4312437b3]
rr(_start+0x2e)[0x44499e]
=== End rr backtrace

marc1uk avatar Sep 21 '23 15:09 marc1uk

It looks like you're trying to record a 32-bit process.

rocallahan avatar Sep 22 '23 06:09 rocallahan

It would seem like rr thinks so, but my OS is 64-bit:

$ uname -a
Linux sukap01 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Mar 25 14:36:04 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

and the application i'm running is 64-bit

$ file main 
main: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=5b72484ed706bd8cbb3d1870ea842f1b1a02a9b6, with debug_info, not stripped

so I'm not sure what's giving it that impression.... :thinking:

marc1uk avatar Sep 22 '23 12:09 marc1uk

Maybe this check in Task.cc is doing the wrong thing:

  SupportedArch a = is_long_mode_segment(registers.cs()) ? x86_64 : x86;

Can you dig into that?

rocallahan avatar Sep 22 '23 12:09 rocallahan

I'm afraid looking at the relevant lines i'm not sure what I would need to do to dig into it. I printed out some basic printouts around that line:

checking supported arch: registers.cs() returns: 51
is_long_mode_segment says: 1
resulting arch is x86_64
checking supported arch: registers.cs() returns: 51
is_long_mode_segment says: 1
resulting arch is x86_64
checking supported arch: registers.cs() returns: 51
is_long_mode_segment says: 1
resulting arch is x86_64

this gets printed many pages of times, but eventually just before the crash I get

checking supported arch: registers.cs() returns: 35
is_long_mode_segment says: 0
resulting arch is x86
checking supported arch: registers.cs() returns: 35
is_long_mode_segment says: 0
resulting arch is x86
[FATAL src/AddressSpace.cc:315:map_rr_page() errno: ENOENT] Failed to locate librrpage_32.so
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x28)[0x459ba8]
rr(_ZN2rr15notifying_abortEv+0xe)[0x45c76e]
rr[0x44ddc2]
rr(_ZN2rr12AddressSpace11map_rr_pageERNS_18AutoRemoteSyscallsE+0x697)[0x5a2057]
rr(_ZN2rr12AddressSpace17post_exec_syscallEPNS_4TaskE+0x75)[0x5a2245]
rr(_ZN2rr4Task17post_exec_syscallERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2d)[0x484b4d]                                                                                     
rr[0x5115cd]
rr[0x4e9957]
rr(_ZN2rr19rec_process_syscallEPNS_10RecordTaskE+0x92)[0x514802]
rr(_ZN2rr13RecordSession21syscall_state_changedEPNS_10RecordTaskEPNS0_9StepStateE+0x862)[0x524f52]                                                                                           
rr(_ZN2rr13RecordSession11record_stepEv+0x5a3)[0x51f4b3]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0xd6b)[0x522e0b]                                                                        
rr(main+0x157)[0x4447c7]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f433ddb47b3]
rr(_start+0x2e)[0x44499e]
=== End rr backtrace
Aborted

I'm not sure what to do with this info though.

marc1uk avatar Sep 22 '23 13:09 marc1uk

Can you run rr under gdb with gdb --args rr record ... and set a breakpoint at AddressSpace.cc:315?

Then when we hit that breakpoint, you can look at the process tree under rr and find your spawned process and look at its /proc/<pid>/exe to confirm that it's not a 32-bit executable.

rocallahan avatar Sep 22 '23 13:09 rocallahan

I'd also check its /proc/<pid>/maps to make sure it's 64-bit and the program you expect.

rocallahan avatar Sep 22 '23 13:09 rocallahan

Perhaps we could improve the error message by printing out which program we're currently trying to record?

Keno avatar Sep 22 '23 13:09 Keno

it didn't seem to like gdb? maybe i'm doing something wrong

moflaher@sukap01:~/SKAnalysis/SKAnalysis$ gdb --args rr record -n configfiles/SpallReduction/ToolChainConfig 
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-12.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from rr...done.
(gdb) b AddressSpace.cc:315
No line 315 in file "AddressSpace.cc".
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (AddressSpace.cc:315) pending.
(gdb) r
Starting program: /disk1/disk02/usr6/moflaher/rr/obj/bin/rr record -n configfiles/SpallReduction/ToolChainConfig
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64
warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
[New Thread 0x7ffff6342700 (LWP 1046138)]

Thread 1 "rr" received signal SIGSEGV, Segmentation fault.
0x000000000045e7da in rr::cpuid_faulting_works() ()
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.5.0-4.el8_5.x86_64 libstdc++-8.5.0-4.el8_5.x86_64 zlib-1.2.11-18.el8_5.x86_64
(gdb) 

I'm not sure what exactly i'd be checking if it worked, tbh.

n.b. i'm setting the -n flag as my system has /proc/sys/kernel/perf_event_paranoid = 2 and i don't have admin privileges to change that. I've dumped /proc/<pid>/maps for the process, but don't really know how to tell whether it's 64-bit. Given that i'm getting the pid from the program process, it seems a tautology to say it represents the program i expect. :thinking: An excerpt (it is very long):

moflaher@sukap01:~/SKAnalysis/SKAnalysis$ cat /proc/1059721/maps                                                                                                                              
00400000-00513000 r-xp 00000000 00:2e 3453581635                         /home/moflaher/SKAnalysis/SKAnalysis/main                                                                            
00713000-00714000 r--p 00113000 00:2e 3453581635                         /home/moflaher/SKAnalysis/SKAnalysis/main                                                                            
00714000-00716000 rw-p 00114000 00:2e 3453581635                         /home/moflaher/SKAnalysis/SKAnalysis/main                                                                            
00716000-1390b000 rw-p 00000000 00:00 0                                                                                                                                                       
13dfe000-14dcc000 rw-p 00000000 00:00 0                                  [heap]                                                                                                               
7fca50000000-7fca50021000 rw-p 00000000 00:00 0                                                  

marc1uk avatar Sep 22 '23 13:09 marc1uk

I'm not sure about this idea of making sure it's the program I expect. I give rr a full path to my binary, so i'd be pretty surprised if it started debugging something else. The application does link in numerous dependencies. Would that be problematic? It seems fairly commonplace...

marc1uk avatar Sep 22 '23 13:09 marc1uk

OK, that does look fine. I don't really have a clue why rr would start thinking your process is 32-bit.

FWIW maybe you should just try building rr with 32-bit support and see if that helps?

rocallahan avatar Sep 22 '23 14:09 rocallahan

Unfortunately, I built without 32-bit support as CMake said the toolchain didn't support cross-platform compilation, and this isn't a system i'm an admin on to easily install the necessary dependencies. I could ask our sysadmin for the necessary dependencies, but to be honest I'm already trying a new debugger as i'm neck deep in trying to solve issues with my own application, I'm afraid don't really have the time to spend debugging the debugger! :sweat_smile:

marc1uk avatar Sep 22 '23 14:09 marc1uk

If you are sure the code is running in 64-bit mode, just hardcode is_long_mode_segment to return true should work?

yuyichao avatar Sep 22 '23 14:09 yuyichao

Alternatively, just download a binary build from https://github.com/JuliaBinaryWrappers/rr_jll.jl/releases/tag/rr-v5.6.0%2B1, which should have 32bit support enabled.

Keno avatar Sep 22 '23 14:09 Keno

it looks like the release page has 64-bit versions, 32-bit versions, aarch versions and 'logs' releases. Being that i'm running a 64-bit OS and (ostensibly) 64-bit application, i grabbed the 64-bit version but get the same complaint about missing librrpage_32.so (which indeed doesn't seem present). I'll see what happens hard-coding it.... :crossed_fingers:

marc1uk avatar Sep 22 '23 14:09 marc1uk

Huh, you're right, the 32bit versions have gone missing in the latest binary release.

Keno avatar Sep 22 '23 14:09 Keno

Alas, hard-coding the checks did not work.

checking supported arch: registers.cs() returns: 35
is_long_mode_segment says: 0
resulting arch is x86
[FATAL src/Task.cc:3204:ptrace_if_stopped() errno: EIO]
 (task 1363912 (rec:1363912) at time 28206)
 -> Assertion `!errno' failed to hold. ptrace(PTRACE_SETREGSET, 1363912, addr=0x1, data=0x7ffe4998cce0) failed with errno 5
Tail of trace dump:
{
  real_time:2091095.051344 global_time:28186, event:`SYSCALL: rt_sigaction' (state:EXITING_SYSCALL) tid:1363912, ticks:205
rax:0x0 rbx:0x11 rcx:0xffffffffffffffff rdx:0x7ffdf762f4c0 rsi:0x7ffdf762f420 rdi:0x11 rbp:0x7ffdf762f570 rsp:0x7ffdf762f420 r8:0x7ffdf762f610 r9:0x0 r10:0x8 r11:0x246 r12:0x7ffdf762f610 r13
:0x55dca44bf6e0 r14:0x55dca476c800 r15:0x60 rip:0x7f82805c6954 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x7f828119c740 gs_base:0x0
  { tid:1363912, addr:0x7ffdf762f4c0, length:0x20 }
}
{
  real_time:2091095.051375 global_time:28187, event:`SYSCALL: rt_sigaction' (state:ENTERING_SYSCALL) tid:1363912, ticks:364
rax:0xffffffffffffffda rbx:0x11 rcx:0xffffffffffffffff rdx:0x7ffdf762f500 rsi:0x7ffdf762f460 rdi:0x11 rbp:0x7ffdf762f5b0 rsp:0x7ffdf762f460 r8:0x7ffdf762f650 r9:0x0 r10:0x8 r11:0x246 r12:0x7
ffdf762f650 r13:0x55dca509d430 r14:0x0 r15:0x60 rip:0x7f82805c6954 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x7f828119c740 gs_base:0x0
}
{
  real_time:2091095.051400 global_time:28188, event:`SYSCALL: rt_sigaction' (state:EXITING_SYSCALL) tid:1363912, ticks:364
rax:0x0 rbx:0x11 rcx:0xffffffffffffffff rdx:0x7ffdf762f500 rsi:0x7ffdf762f460 rdi:0x11 rbp:0x7ffdf762f5b0 rsp:0x7ffdf762f460 r8:0x7ffdf762f650 r9:0x0 r10:0x8 r11:0x246 r12:0x7ffdf762f650 r13
:0x55dca509d430 r14:0x0 r15:0x60 rip:0x7f82805c6954 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x7f828119c740 gs_base:0x0
  { tid:1363912, addr:0x7ffdf762f500, length:0x20 }
}
{
  real_time:2091095.051434 global_time:28189, event:`SYSCALL: rt_sigaction' (state:ENTERING_SYSCALL) tid:1363912, ticks:382
rax:0xffffffffffffffda rbx:0x2 rcx:0xffffffffffffffff rdx:0x7ffdf762f500 rsi:0x7ffdf762f460 rdi:0x2 rbp:0x7ffdf762f5b0 rsp:0x7ffdf762f460 r8:0x7ffdf762f650 r9:0x0 r10:0x8 r11:0x246 r12:0x7ff
df762f650 r13:0x55dca509d430 r14:0x0 r15:0x60 rip:0x7f82805c6954 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x7f828119c740 gs_base:0x0
}
{
  real_time:2091095.051460 global_time:28190, event:`SYSCALL: rt_sigaction' (state:EXITING_SYSCALL) tid:1363912, ticks:382
rax:0x0 rbx:0x2 rcx:0xffffffffffffffff rdx:0x7ffdf762f500 rsi:0x7ffdf762f460 rdi:0x2 rbp:0x7ffdf762f5b0 rsp:0x7ffdf762f460 r8:0x7ffdf762f650 r9:0x0 r10:0x8 r11:0x246 r12:0x7ffdf762f650 r13:0
x55dca509d430 r14:0x0 r15:0x60 rip:0x7f82805c6954 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x7f828119c740 gs_base:0x0
  { tid:1363912, addr:0x7ffdf762f500, length:0x20 }
}
{
  real_time:2091095.051497 global_time:28191, event:`SYSCALL: dup2' (state:ENTERING_SYSCALL) tid:1363912, ticks:391
rax:0xffffffffffffffda rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x1 rdi:0xf rbp:0xffffffff rsp:0x7ffdf762f708 r8:0x7ffdf762f650 r9:0x0 r10:0x8 r11:0x246 r12:0x55dca5090150 r13:0x55dca509d4
30 r14:0x0 r15:0x60 rip:0x7f828067cfdb eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x21 fs_base:0x7f828119c740 gs_base:0x0
}
{
  real_time:2091095.051522 global_time:28192, event:`SYSCALL: dup2' (state:EXITING_SYSCALL) tid:1363912, ticks:391
rax:0x1 rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x1 rdi:0xf rbp:0xffffffff rsp:0x7ffdf762f708 r8:0x7ffdf762f650 r9:0x0 r10:0x8 r11:0x246 r12:0x55dca5090150 r13:0x55dca509d430 r14:0x0 r15:
0x60 rip:0x7f828067cfdb eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x21 fs_base:0x7f828119c740 gs_base:0x0
}
{
  real_time:2091095.051551 global_time:28193, event:`SYSCALL: close' (state:ENTERING_SYSCALL) tid:1363912, ticks:403
rax:0xffffffffffffffda rbx:0xf rcx:0xffffffffffffffff rdx:0x0 rsi:0x1 rdi:0xf rbp:0xffffffff rsp:0x7ffdf762f708 r8:0x7ffdf762f650 r9:0x0 r10:0x8 r11:0x246 r12:0x55dca5090150 r13:0x55dca509d4
30 r14:0x0 r15:0x60 rip:0x7f828067cf28 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3 fs_base:0x7f828119c740 gs_base:0x0

Perhaps this validates that the checks are not returning bogus values, but it remains a mystery what it seems to be loading that's 32-bit. I can see a task id member... would that be something i can turn into a human readable application/library name?

marc1uk avatar Sep 22 '23 14:09 marc1uk

Perhaps we could improve the error message by printing out which program we're currently trying to record?

97091f5acc104ffe480a2c2ee3299d03d1803752

rocallahan avatar Sep 22 '23 21:09 rocallahan

-> Assertion `!errno' failed to hold. ptrace(PTRACE_SETREGSET, 1363912, addr=0x1, data=0x7ffe4998cce0) failed with errno 5

This is exactly what I would expect to see if the tracee is in fact 32-bit but we treat it as 64-bit.

rocallahan avatar Sep 22 '23 21:09 rocallahan

97091f5

You could rerun with a build with this commit and we'll see what the error says now.

rocallahan avatar Sep 25 '23 12:09 rocallahan

That works!

[FATAL src/AddressSpace.cc:315:map_rr_page() errno: ENOENT] Failed to locate librrpage_32.so; needed by /usr/lib/ld-2.28.so (x86)

Hmm.... ld.so eh? :confused: Not as informative as I was hoping for! :sweat_smile: My first impression would be that perhaps this suggests I'm linking against a 32-bit library? But a quick google suggests that wouldn't be possible....

marc1uk avatar Sep 25 '23 13:09 marc1uk

is something running ldd in your program?

yuyichao avatar Sep 25 '23 14:09 yuyichao

Nothing that I'm aware of....

marc1uk avatar Sep 25 '23 16:09 marc1uk

Maybe something useful in /proc/pid/cmdline? Attached patch should print the contents in the error message. cmdline.patch.txt

bernhardu avatar Sep 25 '23 23:09 bernhardu