trufflehog
trufflehog copied to clipboard
Native support for scanning docker images (transparent nested .tar unpacking)
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
It would be nice if trufflehog could smartly scan nested .tar files, as seen in e.g. docker containers.
Problem to be Addressed
When scanning a docker image tarball (such as one saved with docker save ...), trufflehog currently just prints the top-level .tar filename for every hit. This doesn't give a lot of transparency to what component inside the image, or what resulting file path inside a container launched using the image, contains the hit.
Description of the Preferred Solution
Best-case, trufflehog would understand and record-keep when looking inside tar archives, and support doing so in a nested fashion, because docker images are typically nested .tar files of multiple layers, and then print out that context on a hit, maybe something like:
File: foo.tar:b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20:etc/secrets
Maybe this would be something generalized, that makes trufflehog filesystem smarter. Or, it might have to be a dedicated mode, trufflehog archive or something. Uncompressed .tar is one thing; I expect compressed archives would be more painful.
Additional Context
There is a fuse filesystem for mounting archives which supports recursive/nested archives as well, https://github.com/mxmlnkn/ratarmount, which transparently turns archive files into subdirectories.
So for example:
mkdir -p some_container
ratarmount -c -r -o ro,allow_other some_container.tar some_container
trufflehog filesystem --directory=some_container 2>&1 | tee "trufflehog_some_container.out"
Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://user:host@foo:3128
File: some_container/e60a0dfc08a94dabb221d8a28c6fdbeaa7cab0c146d35e8eff8e50bc2e4c194b/layer.tar/usr/lib/python2.7/site-packages/urlgrabber/grabber.py
Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://username:[email protected]:80/path
File: some_container/96e436883f4940841fc9f1f7e935bada3859d2ffb0e5455952438d844f8e9c26/layer.tar/usr/lib/python2.7/site-packages/pip/_vendor/urllib3/util/url.py
Found unverified result 🐷🔑❓
Detector Type: PrivateKey
Raw result: -----BEGIN PRIVATE KEY-----
MIICd[snip]
-----END PRIVATE KEY-----
File: some_container/b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20/layer.tar/usr/share/doc/perl-IO-Socket-SSL/example/simulate_proxy.pl
...
Or for a large collection of them:
# for A in *tar ; do
D=$(echo "$A" | sed 's/\.tar$//') ;
mkdir -p "$D" ;
ratarmount -r -o ro,allow_other "$A" "$D" ;
done
$ for A in *tar ; do
D=$(echo "$A" | sed 's/\.tar$//') ;
test -s "trufflehog_${D}.out" && continue ;
echo "$D" ;
trufflehog filesystem --directory="$D" 2>"trufflehog_${D}.err" | tee "trufflehog_${D}.out"
done
If adding native nested-archive support does not seem worth it/desirable, then perhaps just polish/improve this example and document it somewhere.