HuggingFace token detector not working properly (wrong number of characters)
Discussed in https://github.com/aquasecurity/trivy/discussions/6784
Originally posted by asankov May 27, 2024
Description
I am playing around with the secret detector in https://github.com/aquasecurity/trivy/blob/main/pkg/fanal/secret/ and I notice that the detector is not able to detect Hugging Face tokens.
Looking at the HF Regex it expected 39 symbols after hf_. However, my HF token has only 34 symbols.
Example HF token: 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 (already revoked)
Desired Behavior
Detect a HF token.
Actual Behavior
Not detecting a HF token.
Reproduction Steps
1. Create a Hugging Face account at https://huggingface.co/
2. Generate an API token at https://huggingface.co/settings/tokens
3. Provide that token as input to the `secret.Scanner`
4. Assert that it returns no findings
Target
Filesystem
Scanner
Secret
Output Format
None
Mode
Standalone
Debug Output
$ trivy fs hf --debug
2024-05-27T13:40:23+03:00 DEBUG Parsed severities severities=[UNKNOWN LOW MEDIUM HIGH CRITICAL]
2024-05-27T13:40:23+03:00 DEBUG Ignore statuses statuses=[]
2024-05-27T13:40:23+03:00 DEBUG Cache dir dir="/Users/asankov/Library/Caches/trivy"
2024-05-27T13:40:23+03:00 DEBUG DB update was skipped because the local DB is the latest
2024-05-27T13:40:23+03:00 DEBUG DB info schema=2 updated_at=2024-05-27T06:12:09.854561954Z next_update=2024-05-27T12:12:09.854561794Z downloaded_at=2024-05-27T10:39:59.156462Z
2024-05-27T13:40:23+03:00 INFO Vulnerability scanning is enabled
2024-05-27T13:40:23+03:00 DEBUG Vulnerability type type=[os library]
2024-05-27T13:40:23+03:00 INFO Secret scanning is enabled
2024-05-27T13:40:23+03:00 INFO If your scanning is slow, please try '--scanners vuln' to disable secret scanning
2024-05-27T13:40:23+03:00 INFO Please see also https://aquasecurity.github.io/trivy/v0.51/docs/scanner/secret/#recommendation for faster secret detection
2024-05-27T13:40:23+03:00 DEBUG Enabling misconfiguration scanners scanners=[azure-arm cloudformation dockerfile helm kubernetes terraform terraformplan-json terraformplan-snapshot]
2024-05-27T13:40:23+03:00 DEBUG [secret] No secret config detected config_path="trivy-secret.yaml"
2024-05-27T13:40:23+03:00 DEBUG [nuget] The nuget packages directory couldn't be found. License search disabled
2024-05-27T13:40:23+03:00 DEBUG OS is not detected.
2024-05-27T13:40:23+03:00 DEBUG Detected OS: unknown
2024-05-27T13:40:23+03:00 INFO Number of language-specific files num=0
Operating System
macOS
Version
Version: 0.51.4
Vulnerability DB:
Version: 2
UpdatedAt: 2024-05-27 06:12:09.854561954 +0000 UTC
NextUpdate: 2024-05-27 12:12:09.854561794 +0000 UTC
DownloadedAt: 2024-05-27 10:39:59.156462 +0000 UTC
Checklist
- [X] Run
trivy image --reset - [X] Read the troubleshooting
Would it make sense to create a topic about token format on the forum? https://discuss.huggingface.co/
I asked about this in HuggingChat:
After further investigation, I found that the Hugging Face token can indeed have fewer characters.
According to the Hugging Face documentation, the token can be either 34 or 40 characters long, depending on the type of token.
So, to confirm, the schema for the Hugging Face token is:
hf_<34 or 40 alphanumeric characters>
So token format is hf_<34 or 40 alphanumeric characters>