graal icon indicating copy to clipboard operation
graal copied to clipboard

[Native Image] Incorrect .debug_frame section when using the `-g` option

Open visheshruparelia opened this issue 8 months ago • 4 comments

Describe the Issue

Hi,

I am trying to generate a native-image using the -g option. In the generated .debug file, I am noticing some discrepancies in the FDEs when compared to the actual dis-assembly of the executable.

I have a simple Java application App.java which does some work. When I look at the FDEs printed by running readelf -wF app and try to do a disas in GDB for a particular loc, I notice that the assembly does not match the FDEs printed in the readelf output.

Reproducer: https://github.com/visheshruparelia/graalvm-native-image-reproducer

More details below.

Using the latest version of GraalVM can resolve many issues.

GraalVM Version

openjdk 24.0.1 2025-04-15
OpenJDK Runtime Environment GraalVM CE 24.0.1+9.1 (build 24.0.1+9-jvmci-b01)
OpenJDK 64-Bit Server VM GraalVM CE 24.0.1+9.1 (build 24.0.1+9-jvmci-b01, mixed mode, sharing)

Operating System and Version

Ubuntu 22.04.5 LTS x86_64

Build Command

native-image -g App

Expected Behavior

The output of readelf -wF app should look like this (note this is not a full output but one FDE entry where I noticed the data is incorrect):

000193b8 000000000000001c 00000000 FDE cie=00000000 pc=00000000001d9670..00000000001d9cf2
   LOC           CFA      ra
00000000001d9670 rsp+8    c-8   // The LOC value may vary across different compilations
00000000001d9674 rsp+16   c-8
00000000001d9677 ........    // Notice LOC 

Actual Behavior

The actual FDE entry per readelf

000193b8 000000000000001c 00000000 FDE cie=00000000 pc=00000000001d9670..00000000001d9cf2
   LOC           CFA      ra
00000000001d9670 rsp+8    c-8   // The LOC value may vary across different compilations
00000000001d9674 rsp+16   c-8
00000000001d9cf1 rsp+8    c-8  // The LOC here seems to be not right...

Steps to Reproduce

  1. Clone the reproducer: https://github.com/visheshruparelia/graalvm-native-image-reproducer
  2. docker build -t repro .
  3. docker run -it repro
  4. javac App.java
  5. native-image -g App
  6. Run gdb app
  7. disas 1939416
  8. The above should give you dis-assembly which starts at some address. Note this address as it may vary across different builds. In my case it was: 0x00000000001d9670
  9. Note the assembly and Exit GDB.
  10. readelf -wF app
  11. In the output, find the FDE corresponding to the address which was noted in step 8.

Additional Context

The output of step 7. as mentioned in the "Steps to Reproduce" section above will be (Note the address may vary when you run it in your system):

(gdb) disas 1939416
Dump of assembler code for function _ZN14java.lang.Math3cosEJdd:
   0x00000000001d9670 <+0>:	sub    $0x8,%rsp
   0x00000000001d9674 <+4>:	mov    %rsp,%rbp
   0x00000000001d9677 <+7>:	push   %rbx
   0x00000000001d9678 <+8>:	sub    $0x10,%rsp
   0x00000000001d967c <+12>:	vmovsd %xmm0,0x8(%rsp)
   0x00000000001d9682 <+18>:	mov    0xc(%rsp),%eax
   0x00000000001d9686 <+22>:	vmovq  0x3e6cd2(%rip),%xmm1        # 0x5c0360
   0x00000000001d968e <+30>:	and    $0x7fff0000,%eax
   0x00000000001d9694 <+36>:	sub    $0x30300000,%eax
   0x00000000001d969a <+42>:	cmp    $0x10c50000,%eax
   0x00000000001d96a0 <+48>:	ja     0x1d97e9 
.....
.....

The output of the step 10. as mentioned in the "Steps to Reproduce" section above will be:

000193b8 000000000000001c 00000000 FDE cie=00000000 pc=00000000001d9670..00000000001d9cf2
   LOC           CFA      ra
00000000001d9670 rsp+8    c-8
00000000001d9674 rsp+16   c-8
00000000001d9cf1 rsp+8    c-8    

....
....

Comparing both the outputs, the first two entries match (0x00000000001d9670 and 0x00000000001d9674). However, the third entry -- after the mov instruction (3 byte long) the next entry should be at 0x00000000001d9677. This holds true for the disas output but not for readelf output. As a result, it points me to the fact that the app.debug file generated by the native-image -g option has some issues.

Build Log Output and Error Messages

No response

visheshruparelia avatar Jul 14 '25 22:07 visheshruparelia

Hi @visheshruparelia,

Thank you for reaching out to us! Could you please share with us the reproducer using a github repo? since using the zip file is against our policy.

Thank you!

selhagani avatar Jul 14 '25 23:07 selhagani

Hi @selhagani thank you for the response.

I have updated the issue description to now contain the repo: https://github.com/visheshruparelia/graalvm-native-image-reproducer

Please let me know if you have any questions about the setup or the bug.

visheshruparelia avatar Jul 15 '25 08:07 visheshruparelia

@adinn could you please take a look at this when your time allows?

fniephaus avatar Jul 15 '25 22:07 fniephaus

@fniephaus The debug frame generation for both AArch64 and X86 is predicated on a very specific expectation regarding the compiled code frame management policy i.e. that a method employs a fixed size stack frame, that the stack is incremented by the relevant size in the method prologue and that it is decremented by the same amount at one or more points that precede a return instruction. The location of the increment insn and the decrement plus associated return insns are expected to be notified to the debug code by markers in the compiled code (cf. class DebugInfoProvider.DebugFrameSizeChange).

It would seem from the above report that these assumptions no longer hold for (at least) some compiled methods i.e. someone has made the compiler smarter and forgotten about adding comparable smarts to the debugger. If that is the case then the compiler changes need to be assessed and any required changes to the frame update notifications and the frame generation strategy needs to be incorporated into the generator code. i.e. it may need some smaller or larger level of redesign and reimplementation.

I am mildly surprised that this has not been detected by the debug test code. However, I say only mildly surprised because the test assumes any method is as good as any other when it comes to relying on frame layout info. If the compiler change only applies to certain types of method then the test suite would very likely need to be upgraded explicitly to cover those cases.

I am afraid I am not currently in a position to do that assessment and redesign/update the code and tests. If someone could investigate and summarize any relevant compiler changes then I might be able to advise on how much and what type of work would be involved and clarify how much time I could commit to help fix the problem.

adinn avatar Jul 16 '25 11:07 adinn