noseyparker
noseyparker copied to clipboard
Missing purposely placed secrets in test file
Describe the bug I've been evaluating a number of solutions, similar to noseyparker and it seems like the tool is missing some secrets, I would have expected it to find.
To Reproduce I've created a sample .txt-file with some values, that may be considered secrets:
cat /tmp/foo/file.txt
var secret="secret123";
var jwt="secret123";
var apitoken="secret123";
var AWS_ACCESS_KEY_ID="ASIAY34FZKBOKMUTVV7A";
var password="secret123";
var db_password="secret123";
./noseyparker scan /tmp/foo/
The resulting report will only find the following two lines:
var password="secret123";
var db_password="secret123";
Expected behavior I would have expected more matches, at least the AWS_ACCESS_KEY_ID.
Actual behavior Noseyparker finds two of the six purposely placed secrets in a test file.
Screenshots NA
Output of noseyparker --version
./noseyparker --version
noseyparker 0.23.0
Additional context Debian 12 host, noseyparker is run from the provided pre-built binary.
@BreakfastSerial thanks for the detailed report.
A bunch of the examples you posted above are difficult to match precisely with regular expressions (the detection mechanism used by open-source Nosey Parker). To avoid inundating with false positives, I haven't written generic rules to detect things like above.
In a small crafted test case it is hard to see what the FP rate would be like, but if you run at larger scale you'll see it. I run Nosey Parker over ~2TB of inputs every time I add new rules. GitHub regex code search is also invaluable. For example, you can search for /apitoken\s*=\s*"[^"]{5,60}"/, which gives lots of noise.
Note that if you run with noseyparker scan --ruleset all, some additional rules are enabled, including one that will detect the AWS key ID. That rule is not enabled in the default set since a more specific rule (AWS API Credentials) that detects both the ID and secret supplants it. In general, AWS key IDs on their own are not exploitable; you need the secret as well.
Also, if you'd like to see examples of what Nosey Parker's rules will detect, you can run noseyparker rules list --format json and view the examples in there. Or look in the source tree to view them in YAML form.
Thank you for your answer, your arguments speak for your project in terms of quality.
In our use-case, we only cover a relatively small set of data, so noise would be less of an issue, but in the big picture it will affect the results immensely.
I'm looking at some of the rulesets and wonder if it would make sense for us to formulate our own. Maybe I missed it, but is there a guide on how to define rulesets for noseyparker?
Thank you for your time and effort!
@BreakfastSerial — there's not a great writeup on creating rulesets right now. But there is an example you can riff from in the description of https://github.com/praetorian-inc/noseyparker/issues/246. (Better documentation is coming eventually; see #245)