LLM4Decompile icon indicating copy to clipboard operation
LLM4Decompile copied to clipboard

I wonder if you could share some experience on colllecting dataset

Open Pisces032 opened this issue 1 year ago • 1 comments

I'm trying to peft it. And I have got some dataset, but they either too small or having too many headers to install. The install commands of different headers differ greatly. So I wonder if you have any advice on how to find suitable datasets like AnghaBench. Thank you so much!

Pisces032 avatar Sep 01 '24 03:09 Pisces032

We've only found AnghaBench and Exebench, which cover nearly all available C libraries. If you have specific requirements, you might need to manually compile larger projects like Linux. While it's time-consuming, this approach can be beneficial for improving the model further, and that's what we're doing now.

albertan017 avatar Sep 21 '24 10:09 albertan017