humanify Explore / benchmark humanify against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"

https://arxiv.org/abs/2506.20170
- JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation
- Deobfuscating JavaScript (JS) code poses a significant challenge in web security, particularly as obfuscation techniques are frequently used to conceal malicious activities within scripts. While Large Language Models (LLMs) have recently shown promise in automating the deobfuscation process, transforming detection and mitigation strategies against these obfuscated threats, a systematic benchmark to quantify their effectiveness and limitations has been notably absent. To address this gap, we present JsDeObsBench, a dedicated benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation. We detail our benchmarking methodology, which includes a wide range of obfuscation techniques ranging from basic variable renaming to sophisticated structure transformations, providing a robust framework for assessing LLM performance in real-world scenarios. Our extensive experimental analysis investigates the proficiency of cutting-edge LLMs, e.g., GPT-4o, Mixtral, Llama, and DeepSeek-Coder, revealing superior performance in code simplification despite challenges in maintaining syntax accuracy and execution reliability compared to baseline methods. We further evaluate the deobfuscation of JS malware to exhibit the potential of LLMs in security scenarios. The findings highlight the utility of LLMs in deobfuscation applications and pinpoint crucial areas for further improvement.
- https://www.alphaxiv.org/overview/2506.20170v1

Originally shared by @neoOpus:

@0xdevalias I came across this today and thought it might be of interest to you. I believe it could also be worth reading for anyone else who might come across it as well. https://arxiv.org/pdf/2506.20170

Originally posted by @neoOpus in https://github.com/jehna/humanify/issues/533#issuecomment-3085620813

I came across this today and thought it might be of interest to you.

@neoOpus Oh awesome; thanks! Probably worth opening a new issue for this sort of thing though; as it's not related to this PR.

Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/533#issuecomment-3085829316

Interesting paper

Ran the benchmark in GitHub Codespaces and it took a few hours to finish, results are available on https://j4k0xb.github.io/jsdeobf.github.io/ for now (will create a PR later) single: combination:

Execution is almost 100% (some scripts take longer than the default 2s timeout, maybe also due to bugs) Decomplexity/Similarity is hard to judge, maybe webcrack outputs suboptimal results in some cases or it's because of other transformations for readability (unrelated to this obfuscator) that make it less similar

https://github.com/relative/synchrony hasn't been updated in a while, would have probably achieved better results with an older obfuscator version https://github.com/ben-sb/javascript-deobfuscator is general purpose. idk why the authors chose it over https://github.com/ben-sb/obfuscator-io-deobfuscator which is tailored towards this obfuscator. Expecting the latter one to have similar scores as webcrack.

Originally posted by @j4k0xb in #189

Jul 20 '25 00:07 0xdevalias

I also opened the following upstream issue on JsDeObsBench, to consider including identifier naming in a future version of the benchmark:

https://github.com/Ch3nYe/JsDeObsBench/issues/2

The following idea / note came from discussion on https://github.com/j4k0xb/webcrack/issues/189:

..snip..

I'm not sure of how to implement a metric for it, but it would be interesting if a future version of the benchmark was able to take into account variable / identifier naming; in a way that it could help evaluate the usefulness of tools that aim to restore meaningful variable names to de-minified / de-obfuscated code; such that it would allow for tools like humanify (which adds identifier renaming on top of webcrack's general deobfuscation) to be compared / ranked.

Originally posted by @0xdevalias in https://github.com/Ch3nYe/JsDeObsBench/issues/2

Jul 20 '25 01:07 0xdevalias

Explore / benchmark humanify against "JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation"

See Also