albertan017
albertan017
> Are you training on a single node or multiple nodes out of interest? For the 1B model, we use a single node. For larger models, they are typically trained...
Thanks for your interest! Integration with Ghidra and IDA Pro is definitely on our roadmap. Currently, we are concentrating on training a new large language model designed for binary analysis....
We've only found AnghaBench and Exebench, which cover nearly all available C libraries. If you have specific requirements, you might need to manually compile larger projects like Linux. While it's...
可以参考下面的issue https://github.com/albertan017/LLM4Decompile/issues/33 目前没有统一的评估方式,我们也在探索不同的评估方法,比如使用gpt评估: https://github.com/albertan017/LLM4Decompile/blob/main/samples/readability_template.txt 如果有其他推荐的评估方式,欢迎讨论交流~
For compilable data, you may follow the [compilation script for AnghaBench](https://github.com/albertan017/LLM4Decompile/blob/main/train/compile.py), with small modification on handling the source of function (exebench_data['func_def']) and its dependency (exebench_data['synth_deps']). For executable data, it's quite...
Yes, in theory, it should be effective. However, we encounter difficulties in generating the appropriate assembly for execution. As a result, we adjust the input to the Wrapper and alter...
As highlighted in our paper, we initially eliminate functions that cannot be executed by testing the executability of the original function (i.e., **use the dataset_row['func_def']**, not the 'decompiled_c_func' in step...
目前的llm并不具备项目级代码理解能力(llm翻译一段话很简单,翻译一个章节明显出现遗忘问题),训练和推理开销也是极其高(不考虑优化,attention计算是输入长度的三次方关系),训练项目级重建成本和难度太高。 我们更倾向于单独重构,整合重组:利用好函数自身的信息去重构,再将一个个重构的函数一起送入更强的模型(GPT-o1,Deepseek-R1)去refine。llm4decompile负责做好单个函数,GPT等则擅长从更高层次整合数据
The 9B model is based on [Yi-Coder](https://github.com/01-ai/Yi-Coder), while the training script is from [Deepseek-Coder](https://github.com/deepseek-ai/DeepSeek-Coder). We did not test the 9B model for the script, we recommend to use llama factory...
2024.5.10 Update: All the evaluations and models are based on executable! enjoy~ ~~Thanks for your interest for our project! Indeed, we're utilizing object files instead of executables, as our training...