trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Native support for scanning docker images (transparent nested .tar unpacking)

Open hlein opened this issue 3 years ago • 0 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

It would be nice if trufflehog could smartly scan nested .tar files, as seen in e.g. docker containers.

Problem to be Addressed

When scanning a docker image tarball (such as one saved with docker save ...), trufflehog currently just prints the top-level .tar filename for every hit. This doesn't give a lot of transparency to what component inside the image, or what resulting file path inside a container launched using the image, contains the hit.

Description of the Preferred Solution

Best-case, trufflehog would understand and record-keep when looking inside tar archives, and support doing so in a nested fashion, because docker images are typically nested .tar files of multiple layers, and then print out that context on a hit, maybe something like:

File: foo.tar:b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20:etc/secrets

Maybe this would be something generalized, that makes trufflehog filesystem smarter. Or, it might have to be a dedicated mode, trufflehog archive or something. Uncompressed .tar is one thing; I expect compressed archives would be more painful.

Additional Context

There is a fuse filesystem for mounting archives which supports recursive/nested archives as well, https://github.com/mxmlnkn/ratarmount, which transparently turns archive files into subdirectories.

So for example:

mkdir -p some_container
ratarmount -c -r -o ro,allow_other some_container.tar some_container
trufflehog filesystem --directory=some_container 2>&1 | tee "trufflehog_some_container.out"

Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://user:host@foo:3128
File: some_container/e60a0dfc08a94dabb221d8a28c6fdbeaa7cab0c146d35e8eff8e50bc2e4c194b/layer.tar/usr/lib/python2.7/site-packages/urlgrabber/grabber.py

Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://username:[email protected]:80/path
File: some_container/96e436883f4940841fc9f1f7e935bada3859d2ffb0e5455952438d844f8e9c26/layer.tar/usr/lib/python2.7/site-packages/pip/_vendor/urllib3/util/url.py

Found unverified result 🐷🔑❓
Detector Type: PrivateKey
Raw result: -----BEGIN PRIVATE KEY-----
MIICd[snip]
-----END PRIVATE KEY-----
File: some_container/b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20/layer.tar/usr/share/doc/perl-IO-Socket-SSL/example/simulate_proxy.pl
...

Or for a large collection of them:

# for A in *tar ; do 
  D=$(echo "$A" | sed 's/\.tar$//') ;
  mkdir -p "$D" ; 
  ratarmount -r -o ro,allow_other "$A" "$D" ;
done
$ for A in *tar ; do
  D=$(echo "$A" | sed 's/\.tar$//') ;
  test -s "trufflehog_${D}.out" && continue ;
  echo "$D" ;
  trufflehog filesystem --directory="$D" 2>"trufflehog_${D}.err" | tee "trufflehog_${D}.out"
done

If adding native nested-archive support does not seem worth it/desirable, then perhaps just polish/improve this example and document it somewhere.

hlein avatar Jul 27 '22 07:07 hlein