qtrvsim icon indicating copy to clipboard operation
qtrvsim copied to clipboard

Add DWARF support to display currently executed C code (ELF file)

Open jdupak opened this issue 10 months ago • 6 comments

jdupak avatar Apr 18 '24 20:04 jdupak

I think you are referring to

  1. Loading ELF and C source code from the user
  2. Displaying the C source code in an editor tab
  3. find the mapping between instructions and C source locations from the DWARF information in the ELF file
  4. display it in an proper way, such as
    • #124 or
    • https://godbolt.org/

Are there any problems?

trdthg avatar Jun 17 '24 04:06 trdthg

Yes, there will be no C source code. You are loading just ELF and you need to extract the source code information from the debug info in the binary itself. So you will need to find some library that is not too big (LLVM) and is cross platform (including wasm) to read it.

jdupak avatar Jun 17 '24 18:06 jdupak

thanks,then it will be very hard : I

trdthg avatar Jun 17 '24 18:06 trdthg

I'm going to try to do something with this issue, to achieve the same functionality as described above

I'll leave the extracting of the source code as an interface (with options in the menu), and give the following two implementations

  • extract from ELF
  • directly load locally

may be looks like this:

image

Since the latter is easier, I'll try to implement it first.

If I still have time I might work on it, but of course it can be left to others!


For extract, I tried to find and test some disassemblers (tested on x86), e.g. ida, ghidra. ida disassembles quite well, but it's not open source. ghidra is open source, and it works fine, but it's also quite a large project, not easy to use, and doesn't really have good support for dwarf-5 and riscv? (not going in depth here, just sharing some progress and thoughts)

trdthg avatar Jun 27 '24 19:06 trdthg

I think you might be going in the wrong direction here. I quicky browsed GitHub and this is the kind of library we had in mind: https://github.com/GrandChris/elf_analysis I did not check the library in depth.

Direct loading is not useful, since you need to map the code lines with instructions anyway.

On Thu, 27 Jun 2024, at 21:20, trdthg wrote:

I'm going to try to do something with this issue, to achieve the same functionality as described above

I'll leave the extracting of the source code as an interface (with options in the menu), and give the following two implementations

• extract from ELF • directly load locally Since the latter is easier, I'll try to implement it first.

If I still have time I might work on it, but of course it can be left to others!

For extract, I tried to find and test some disassemblers (tested on x86), e.g. ida, ghidra. ida disassembles quite well, but it's not open source. ghidra is open source, and it works fine, but it's also quite a large project, not easy to use, and doesn't really have good support for dwarf-5 and riscv? (not going in depth here, just sharing some progress and thoughts)

— Reply to this email directly, view it on GitHub https://github.com/cvut/qtrvsim/issues/123#issuecomment-2195504063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNB76E7A65IUHWQDGYRH53ZJRQXPAVCNFSM6AAAAABGN7VNIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVGUYDIMBWGM. You are receiving this because you authored the thread.Message ID: @.***>

jdupak avatar Jun 27 '24 22:06 jdupak

I am not saying I will give up reading ELF, it is necessary and included in my plan. I will certainly search for a relevant library to read elf

  • the third step is to parse the mapping relationship, which of course requires reading ELF (DWARF)

The solution I said only temporarily simplifies the first step: how to get C code

  • get by loading the source code locally
    • at least it's necessary to parse source_code_filepath from ELF
  • get by parsing ELF, ...

trdthg avatar Jun 28 '24 01:06 trdthg

I have discussed the goal to use DWARF to map instruction address to source file line with Jan Hubicka at GNU Tools Cauldron and he suggest to look at https://www.nongnu.org/libunwind/

ppisa avatar Sep 30 '24 18:09 ppisa

I did some simple test

  • libunwind is obviously not enough to extract sufficient C code; it can only extract part of the functions, regs
  • because we need to handle ELF files, we need to use libunwind-ptrace (this is not a problem)
  • I previously tried using eliben/pyelftools(a python library which has some simpler and easier apis) to parse DWARF info, it can provide precise details such as variable names, types, and corresponding line numbers.

I discussed it with my friend and they thinks that "It is impossible to not look back at the C code through decompilation in the case of only ELF"

But there may be a way to build map between variable info(name,type,line_number from dwarf) and it's real value with libunwind, ptrace and dwarf

There is a blog that describes some similar ideas, I haven't put it into practice yet

Some reference materials

  • https://blog.tartanllama.xyz/writing-a-linux-debugger-unwinding/
  • https://www.nongnu.org/libunwind/man/unw_init_remote(3).html

And this issue generally looks like it needs to implement a decompiler

trdthg avatar Oct 12 '24 01:10 trdthg