LLM4Decompile icon indicating copy to clipboard operation
LLM4Decompile copied to clipboard

Do you take `struct` into consideration?

Open XinyuShe opened this issue 1 year ago • 3 comments

Do you take struct into consideration? And how do you handle the issue of excessively long functions in assembly code?

XinyuShe avatar Mar 21 '24 08:03 XinyuShe

No, currently we only consider a single function.

Gathering data and developing a workable approach for decompiling complex files with multiple functions and structures is quite demanding. Therefore, this initial version of LLM4Decompile is limited to decompilation of individual functions.

Addressing the complexities posed by external functions and struct definitions is a primary focus of our future decompilation efforts. Our team is actively working on strategies to address these issues. While the nature of the problem maybe ill-posed, a larger and more varied training dataset will allow the model to make statistical guesses about the potential functions and types that correspond to the missing pieces. We'll report the results asap!

albertan017 avatar Mar 25 '24 03:03 albertan017

@albertan017 Thanks for your reply! I am also wondering where did you find those c file datasets without structs and long function?

XinyuShe avatar Mar 25 '24 08:03 XinyuShe

@albertan017 Thanks for your reply! I am also wondering where did you find those c file datasets without structs and long function?

We remove those parts in Anghabench for simplification. The original dataset is available here. But the dataset is only compilable, not linkable. Therefore, we are looking for other benchmarks and collecting our own data.

albertan017 avatar Mar 25 '24 08:03 albertan017