trivy icon indicating copy to clipboard operation
trivy copied to clipboard

HuggingFace token detector not working properly (wrong number of characters)

Open DmitriyLewen opened this issue 1 year ago • 1 comments

Discussed in https://github.com/aquasecurity/trivy/discussions/6784

Originally posted by asankov May 27, 2024

Description

I am playing around with the secret detector in https://github.com/aquasecurity/trivy/blob/main/pkg/fanal/secret/ and I notice that the detector is not able to detect Hugging Face tokens.

Looking at the HF Regex it expected 39 symbols after hf_. However, my HF token has only 34 symbols.

Example HF token: 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 (already revoked)

Desired Behavior

Detect a HF token.

Actual Behavior

Not detecting a HF token.

Reproduction Steps

1. Create a Hugging Face account at https://huggingface.co/
2. Generate an API token at https://huggingface.co/settings/tokens
3. Provide that token as input to the `secret.Scanner`
4. Assert that it returns no findings

Target

Filesystem

Scanner

Secret

Output Format

None

Mode

Standalone

Debug Output

$ trivy fs hf --debug
2024-05-27T13:40:23+03:00	DEBUG	Parsed severities	severities=[UNKNOWN LOW MEDIUM HIGH CRITICAL]
2024-05-27T13:40:23+03:00	DEBUG	Ignore statuses	statuses=[]
2024-05-27T13:40:23+03:00	DEBUG	Cache dir	dir="/Users/asankov/Library/Caches/trivy"
2024-05-27T13:40:23+03:00	DEBUG	DB update was skipped because the local DB is the latest
2024-05-27T13:40:23+03:00	DEBUG	DB info	schema=2 updated_at=2024-05-27T06:12:09.854561954Z next_update=2024-05-27T12:12:09.854561794Z downloaded_at=2024-05-27T10:39:59.156462Z
2024-05-27T13:40:23+03:00	INFO	Vulnerability scanning is enabled
2024-05-27T13:40:23+03:00	DEBUG	Vulnerability type	type=[os library]
2024-05-27T13:40:23+03:00	INFO	Secret scanning is enabled
2024-05-27T13:40:23+03:00	INFO	If your scanning is slow, please try '--scanners vuln' to disable secret scanning
2024-05-27T13:40:23+03:00	INFO	Please see also https://aquasecurity.github.io/trivy/v0.51/docs/scanner/secret/#recommendation for faster secret detection
2024-05-27T13:40:23+03:00	DEBUG	Enabling misconfiguration scanners	scanners=[azure-arm cloudformation dockerfile helm kubernetes terraform terraformplan-json terraformplan-snapshot]
2024-05-27T13:40:23+03:00	DEBUG	[secret] No secret config detected	config_path="trivy-secret.yaml"
2024-05-27T13:40:23+03:00	DEBUG	[nuget] The nuget packages directory couldn't be found. License search disabled
2024-05-27T13:40:23+03:00	DEBUG	OS is not detected.
2024-05-27T13:40:23+03:00	DEBUG	Detected OS: unknown
2024-05-27T13:40:23+03:00	INFO	Number of language-specific files	num=0

Operating System

macOS

Version

Version: 0.51.4
Vulnerability DB:
  Version: 2
  UpdatedAt: 2024-05-27 06:12:09.854561954 +0000 UTC
  NextUpdate: 2024-05-27 12:12:09.854561794 +0000 UTC
  DownloadedAt: 2024-05-27 10:39:59.156462 +0000 UTC

Checklist

DmitriyLewen avatar May 30 '24 06:05 DmitriyLewen

Would it make sense to create a topic about token format on the forum? https://discuss.huggingface.co/

nikpivkin avatar May 30 '24 12:05 nikpivkin

I asked about this in HuggingChat:

After further investigation, I found that the Hugging Face token can indeed have fewer characters.
According to the Hugging Face documentation, the token can be either 34 or 40 characters long, depending on the type of token.
So, to confirm, the schema for the Hugging Face token is:
hf_<34 or 40 alphanumeric characters>

So token format is hf_<34 or 40 alphanumeric characters>

DmitriyLewen avatar Jul 24 '24 08:07 DmitriyLewen