PyRIT Insecure Code Scorer

Is your feature request related to a problem? Please describe.

Garak and CyberSec have insecure code generation detectors. As I understand it, that means they have a scorer LLM or some sort of static analysis that looks for insecure code spit out by a target. We probably want to do that also in PyRIT

Describe the solution you'd like

I'm not sure how the above detect insecure code so some of this is guesswork.

Before implementation it would be good to have a brief analysis of how these platforms detect insecure code, and implement (with credit) similar scorers into PyRIT. This issue may take a bit of investigation, and ideally comment here or in the PR with what you find. If it's tough to implement in PyRIT as a scorer, we want to know that too so we can make it easier at a framework level!

It probably will be a float_scale scorer. And it may be as simple as a likert scorer with insecure code generation template

Oct 30 '24 05:10 rlundeen2

Hello @rlundeen2,

The Insecure Code Detector (ICD) from CyberSec Meta uses static analysis with regex and semgrep analyzers to detect insecure coding practices across multiple languages.

Regarding Garak, I wasn’t able to find specific information on its detection methods within a reasonable timeframe.

For more details on ICD, here’s the link to the source: CyberSec ICD and insecure_code_detector.py.

I’ll take a detailed look on Sunday. If you’re interested in a LLM-based scorer as a starting point for insecure code evaluation, I can set that up easily using a float_scale or Likert-style approach to rate code outputs.

While I haven’t yet worked with static analysis tools, I’m ready to dive into that if we want to go deeper into rule-based detection.

Let me know if you'd like me to proceed with the LLM based scorer, or if there’s another direction you recommend.

Oct 30 '24 16:10 KutalVolkan

Thank you for the investigation!

I would say both are things we'd potentially want. But I think an LLM based scorer will be a lot easier to implement and have a much higher bang for the buck. I would probably do the following:

Implement a basic LLM scorer. Probably float_scale. See if people use it
We can open a follow-up issue to add static analysis to our LLM scorer. I think that would be cool, but it's more work. Still, we might be able to make it relatively easy by just running semgrep (or something) on code snippets extracted by the LLM.

Nov 02 '24 02:11 rlundeen2