LLM4Decompile icon indicating copy to clipboard operation
LLM4Decompile copied to clipboard

Release source code versions of decompile-bench Github projects

Open ChrisMcMStone opened this issue 2 months ago • 2 comments

Hello!

Thanks again for your work on this project.

Would it be possible to release the versions used of the respective Github projects which were compiled into the decompile-bench dataset? For example the git commit tags. This would make the dataset even more valuable since source code information outside the scope of a single function (interprocedural) record could be incorporated.

ChrisMcMStone avatar Oct 09 '25 12:10 ChrisMcMStone

We don't have that info on record, but we're working on a bigger dataset now. We'll be including these metadata:

  • Source code (including the exact git commit, as you suggested)
  • IDA pseudo-code and assembly code
  • Address
  • Optimization flags
  • Source file location

Please let us know if there is any other metadata you would find useful for us to include.

albertan017 avatar Oct 10 '25 06:10 albertan017

That's great, thanks @albertan017. Good luck with it.

ChrisMcMStone avatar Oct 13 '25 12:10 ChrisMcMStone