albertan017

Results 61 comments of albertan017

Thanks for your interest, currently we do not plan to release the framework.

Yes, you can grab the [bins](https://huggingface.co/datasets/LLM4Binary/decompile-bench-bins)—they’re unstripped and include full debug information—but be aware the unzipped size is quite large (around 500 GB).

The LLM4Binary/decompile-ghidra-100k dataset is a sample dataset used for the v2 series models. For training the v2 series, we use a larger dataset consisting of 1 billion tokens (approximately 1.6...

We're using the ExeBench with the first 400K functions, which contains the AnghaBench. Yes, compile the bench and decompile by Ghidra.

We don't have that info on record, but we're working on a bigger dataset now. We'll be including these metadata: - Source code (including the exact git commit, as you...

Not yet—we plan to add arm64 support by the end of this year.

Please use the [vllm script](https://github.com/albertan017/LLM4Decompile/blob/main/evaluation/run_evaluation_llm4decompile_vllm.py) Other scripts have not been updated. Regarding your error, I believe it is associated with the environment rather than the model. You might need to...

Thank you for your interest. The filtering process is fairly straightforward—please see our paper for full details. In brief, we: 1. Exclude any ASM functions not originating from the current...

No — the binaries in decompile-bench are compiled on an x86‑64 (x64) Linux platform with Clang, which is slightly different from 32‑bit x86.

Thanks for you interest! We're working on building a larger and more comprehensive dataset. Please let us know if there is any other data/metadata you would find useful for us...