trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Avoid verifying the same key multiple times in a session

Open bugbaba opened this issue 1 year ago • 12 comments

Hello Team,

Description

Verify a key only once even if it is found multiple times in the same or different files.

For example, in the below screenshot, we can see that the same gitlab key(revoked) is getting verified twice as it's mentioned twice in the file.

image

Preferred Solution

To avoid wasting resources on re-verification and hitting detectors with the same keys multiple times, it is ideal to check if the given key has already been verified.

Maybe a check before the if verify block? to confirm if the key has not been verified previously.

-- Best Regards, @bugbaba

bugbaba avatar Dec 25 '23 16:12 bugbaba

This would benefit both trufflehog and any endpoints it calls.

I was actually playing around with adding caches to detectors so that known verifications only happen once. Work smarter, not harder. :)

rgmz avatar Dec 25 '23 16:12 rgmz

@ahrav has an experimental implementation of this in #2276.

rgmz avatar Jan 30 '24 13:01 rgmz

I agree, using a cache reduces duplicate external API calls for credential verification, improving performance and API stewardship. However, storing plaintext credentials presents a potential security risk if exposed, even though the chance of in-memory hash compromise is low. An option to mitigate exposure while retaining detection speed/efficiency could be hashing credentials before caching them with a high-speed algorithm like XXHash. This safeguards credentials while still allowing cache hits on matched hashes. Overall this balances security, performance, and responsible API usage - preventing duplicate verification calls for the same credentials.

ahrav avatar Jan 30 '24 16:01 ahrav

Also, I think we shouldn't add ExtraData as metadata in the cache as it mostly contains sensitive info extracted from valid responses.

We just need to check if the given match exists in the cache or not. So maybe we can ignore all metadata if there are no plans to use metadata from cache.

if match in cachelist{
  ignore
}
verify block {
  verification logic
  add match to cachelist
}

bugbaba avatar Jan 31 '24 04:01 bugbaba