Classify base64 encoded tokens as belonging to particular tools/services

Open justineyster opened this issue 6 years ago • 2 comments

Raising this as a parallel issue to one I opened today in the IBM fork.

Context

@jribm raised a point while I was working on #156 that, under our current approach, base64 encoded strings won't be classified as belonging to a particular tool. They may be caught by the base64 entropy scanner, but lack of association to a particular tool means that they will not be verifiable.

Examples: X-JFrog-Art-Api: <some-base64-encoded-string> artifactory:_password: <some-base64-encoded-string>

In both of these examples, there is a defined structure for the indicators that the key belongs to a particular service. However, the string itself won't match as an Artifactory key because the encoded string doesn't follow the expected token format.

Given this issue, we could design a two-step approach where we base64 decode suspicious strings to see if they match for a particular tool. I can imagine at least two approaches for doing this:

Search for indicators of a particular tool's token (like the authentication header X-JFrog-Art-Api in the examples above), decode the suspicious string near that indicator, and test it against the regex for that service.
base64 decode strings that are caught by the base64 entropy scanner and test the decoded string against all of the other secret detectors.

Subtasks & step(s)

[x] Raise a parallel issue in https://github.com/Yelp/detect-secrets to gather feedback from upstream community.
[ ] Decide on general approach for decoding and testing suspicious strings.
[ ] Implement solution and merge in our codebase and upstream.

Success criteria

[ ] base64 encoded tokens will be classified as belonging to a particular service.

Apr 09 '19 17:04 justineyster

I'm kind of ambivalent, which approach do you prefer?

I think 1. can be accomplished with something similar to the keyword detector.

For 2. what would your example base64 strings decode to?

May 15 '19 21:05 KevinHock

Noting for posterity, and because we have verifiability now, GitHub API tokens are 40 chars, and can easily be verified via the oauth/scopes endpoint, though I am having a hard time finding the exact link to that API. I can say I hit it yesterday.

Jun 24 '19 21:06 KevinHock

We're going to close this issue as it hasn't received any update in a very long time. Feel free to re-open it if you think it's still relevant.

May 09 '24 17:05 lorenzodb1