trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Enable/Disable archive scanning from commandline

Open 0x736E opened this issue 4 months ago • 10 comments

Please review the Community Note before submitting

Description

Users should be able to enable or disable scanning file archives from the commandline.

There are situations where it is not desirable to scan file archives at all, and at present the only method to effectively disable this behaviour is by setting archive configurations which are impossible (e.g. --archive-max-size=1B).

Additionally due to #2506, which demonstrates that the file archive scanning capability is inconsistently enabled depending upon data source, users should have the ability to enable or disable this behaviour. As it currently stands, file archive scanning is defined internally, and not configurable meaning that some data sources (git) will produce results in scans but others will not. Users should be able to configure this behaviour.

Preferred Solution

A new commandline flag such as --no-archive which disables the default behaviour of scanning file archives.

Additional Context

N/A

References

  • #2257
  • #2506

0x736E avatar Feb 24 '24 20:02 0x736E

Related: #2257

rgmz avatar Feb 24 '24 20:02 rgmz

Related: #2506

0x736E avatar Feb 24 '24 21:02 0x736E

Yes please, somewhere a few versions ago archive scanning changed and it breaks consistently all my scans. I try the workaround to only scan archives with max size 1B but still doesn't succeed. I can't use the latest version of trufflehog until I can fully skip scanning all archives.

clonsdale-canva avatar Mar 04 '24 23:03 clonsdale-canva

Can you please elaborate how your scans are still failing?

Perhaps also try setting the max timeout to 1ms:

--archive-max-timeout=1ms

0x736E avatar Mar 06 '24 20:03 0x736E

I haven't fully debugged the issue, since I'm running on GitHub Actions and it simply says it disconnects. Either the timeout or max size is ineffective. It could be chewing up too many resources to process the archives

clonsdale-canva avatar Mar 07 '24 01:03 clonsdale-canva

@clonsdale-canva It sounds like you are enountering another issue in addition to the archive scanning issues.

From my testing, archive scanning does not affect whether the scan succeeds or not, only the scope of what is scanned and the subsequent output as a result.

0x736E avatar Mar 09 '24 07:03 0x736E

We are working on plans to centralize archive handling which will make it easy to toggle on/off for all sources.

dustin-decker avatar Mar 09 '24 16:03 dustin-decker

@0x736E Spent some time debugging and you're correct. I am running trufflehog on many repositories at once in a highly parallel system. Some change in v3.64.0 caused it to tip over the edge with resource exhaustion which I mistakingly attributed to archive processing.

clonsdale-canva avatar Mar 13 '24 00:03 clonsdale-canva

I would love a flag to just skip archives altogether.

brendan-wiz avatar May 09 '24 15:05 brendan-wiz