trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

[Testing] Avoid duplicate verification requests which happen concurrently

Open kashifkhan0771 opened this issue 5 months ago • 10 comments

Description:

Details: https://github.com/trufflesecurity/trufflehog/issues/2262

Note: This is the first example implementation, if approved we will implement this in the most used detectors.

Checklist:

  • [x] Tests passing (make test-community)?
  • [x] Lint passing (make lint this requires golangci-lint)?

kashifkhan0771 avatar Jul 14 '25 10:07 kashifkhan0771

Hi @kashifkhan0771,

Just gave it a quick test, its calling verify function twice if same key is found in two separate files.

image

bugbaba avatar Jul 14 '25 13:07 bugbaba

Just gave it a quick test, its calling verify function twice if same key is found in two separate files.

I believe this is related to the issue you mentioned here

kashifkhan0771 avatar Jul 14 '25 13:07 kashifkhan0771

This is a cool caching implementation. I did a quick read of the related discussions (thank you very much for the links to additional context) and think that prior to us trying this idea, we should probably investigate improvements to the existing verification cache (e.g. making it worker-aware/thread-safe) before adding a second cache implementation.

Something like this very well might be where we end up, but first trying to improve what we already have could help reduce potential future complexity.

trufflesteeeve avatar Jul 14 '25 15:07 trufflesteeeve

Just gave it a quick test, its calling verify function twice if same key is found in two separate files.

@bugbaba For me it was using cached result for duplicate findings from different files with previous changes as well. However, as you mentioned, I implemented per-secret locking. This ensures that each secret is locked during verification until it's either verified or cached.

Please give it a try and let me know how it goes.

Also, I'm converting this PR back to a draft based on @trufflesteeeve comment. I agree that we should first focus on fixing the existing caching implementation. This PR will remain here for reference or in case we need it in the future.

kashifkhan0771 avatar Jul 15 '25 06:07 kashifkhan0771

@kashifkhan0771 I’m not sure if you’ve considered using Go’s singleflight package to consolidate concurrent requests to the same endpoint into one. We could build a shared abstraction that all detectors use: whenever multiple goroutines ask for the same data at the same time, singleflight coalesces them into a single API call and fans out the response.

This won’t eliminate every duplicate request, requests separated by cache-miss delays can still slip through, but it’s essentially “free” to layer on and can cut down a lot of in-flight calls. For more background, see VictoriaMetrics’ blog on singleflight.

ahrav avatar Jul 15 '25 07:07 ahrav

@kashifkhan0771 I’m not sure if you’ve considered using Go’s singleflight package to consolidate concurrent requests to the same endpoint into one. We could build a shared abstraction that all detectors use: whenever multiple goroutines ask for the same data at the same time, singleflight coalesces them into a single API call and fans out the response.

This won’t eliminate every duplicate request, requests separated by cache-miss delays can still slip through, but it’s essentially “free” to layer on and can cut down a lot of in-flight calls. For more background, see VictoriaMetrics’ blog on singleflight.

Hands down, you always bring great insights 🙌🏻 I really appreciate it. I hadn’t come across singleflight before, but I did have a similar idea in mind about collapsing multiple requests into one. I didn’t realize Go had a package for this, so thanks a lot for pointing it out.

I’ll read through the documentation and blog post to get a better understanding of how to use it effectively. For now, I’ll keep this PR in draft and start incorporating some changes. Let’s see how things go.

kashifkhan0771 avatar Jul 15 '25 07:07 kashifkhan0771

I have implemented singleflight for the GitHub detector for now. Please review and give it a try when you get a chance. I'm also thinking through how we can generalize this for all detectors, as @ahrav suggested.

kashifkhan0771 avatar Jul 15 '25 12:07 kashifkhan0771

@kashifkhan0771 I’m not sure if you’ve considered using Go’s singleflight package to consolidate concurrent requests to the same endpoint into one. We could build a shared abstraction that all detectors use: whenever multiple goroutines ask for the same data at the same time, singleflight coalesces them into a single API call and fans out the response.

This won’t eliminate every duplicate request, requests separated by cache-miss delays can still slip through, but it’s essentially “free” to layer on and can cut down a lot of in-flight calls. For more background, see VictoriaMetrics’ blog on singleflight.

Great read @ahrav. Thanks for sharing

amanfcp avatar Jul 17 '25 07:07 amanfcp

@ahrav I switched to using only singleflight for the verification logic. I chose a much simpler detector which includes response body handling. I believe this approach should effectively filter out duplicate verification requests happening concurrently. What do you think?

kashifkhan0771 avatar Aug 15 '25 11:08 kashifkhan0771

@bugbaba, could you give it another try now?

kashifkhan0771 avatar Oct 27 '25 05:10 kashifkhan0771