trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Match optimization bug if keyword isn't a prefix

Open rgmz opened this issue 1 month ago • 9 comments

Please review the Community Note before submitting

TruffleHog Version

https://github.com/trufflesecurity/trufflehog/commit/4e21590cbe895b0796acec8d3204f9f9013d9d5e

Description

The optimizations introduced in #2812 don't work as expected when multiple secrets are in the same chunk.

Given the following file you'd expect to receive two matches: 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 and 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. However, only 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 is detected.

#!/usr/bin/env python3

from dnslib import *
import argparse 
import socket 
from GeoInfo import GeoInfo
from geopy.distance import great_circle
import geoip2.webservice
from concurrent.futures import ThreadPoolExecutor

parser = argparse.ArgumentParser(description='Run a DNS server')
parser.add_argument('-p', '--port', type=int, required=True,help='port number to listen on')
parser.add_argument('-n', '--name', required=True, help='name of the DNS server')
args = parser.parse_args()
MM_ACCOUNT_ID = 851210
MM_ACCOUNT_ID2 = 852363
MM_API_KEY = "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"
MM_API_KEY2 = "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"
MM_HOST = "geolite.info"
ACCOUNT_1 = (MM_ACCOUNT_ID2, MM_API_KEY2)
ACCOUNT_2 = (MM_ACCOUNT_ID, MM_API_KEY)
DEFAULT_ACCOUNT = ACCOUNT_1 # Prepare two set of credentials for the API service

Steps to Reproduce

This includes the following change to demonstrate what the detector is receiving.

# pkg/detectors/maxmindlicense/v2/maxmindlicense_v2.go
 func (s Scanner) FromData(ctx context.Context, verify bool, data []byte) (results []detectors.Result, err error) {
 	dataStr := string(data)
+     fmt.Printf("Data is: %s\n", dataStr)

Scan using the new matching optimization

$ ./trufflehog filesystem example.txt
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

2024-06-09T11:40:39-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "NpZk4", "with_units": true}
Data is: _mmk"
MM_API_KEY2 = "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"
MM_HOST = "geolite.info"
ACCOUNT_1 = (MM_ACCOUNT_ID2, MM_API_KEY2)
ACCOUNT_2 = (MM_ACCOUNT_ID, MM_API_KEY)
DEFAULT_ACCOUNT = ACCOUNT_1 # Prepare two set of credentials for the API service


Data is: _mmk"
MM_API_KEY2 = "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"
MM_HOST = "geolite.info"
ACCOUNT_1 = (MM_ACCOUNT_ID2, MM_API_KEY2)
ACCOUNT_2 = (MM_ACCOUNT_ID, MM_API_KEY)
DEFAULT_ACCOUNT = ACCOUNT_1 # Prepare two set of credentials for the API service


Found unverified result 🐷🔑❓
Detector Type: MaxMindLicense
Decoder Type: PLAIN
Raw result: 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
Rotation_guide: https://howtorotate.com/docs/tutorials/maxmind/
Version: 2
File: example.txt
Line: 17

Scan the entire chunk

$ ./trufflehog filesystem example.txt --scan-entire-chunk
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

2024-06-09T11:44:27-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "tK7mw", "with_units": true}
Data is: #!/usr/bin/env python3

from dnslib import *
import argparse
import socket
from GeoInfo import GeoInfo
from geopy.distance import great_circle
import geoip2.webservice
from concurrent.futures import ThreadPoolExecutor

parser = argparse.ArgumentParser(description='Run a DNS server')
parser.add_argument('-p', '--port', type=int, required=True,help='port number to listen on')
parser.add_argument('-n', '--name', required=True, help='name of the DNS server')
args = parser.parse_args()
MM_ACCOUNT_ID = 851210
MM_ACCOUNT_ID2 = 852363
MM_API_KEY = "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"
MM_API_KEY2 = "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"
MM_HOST = "geolite.info"
ACCOUNT_1 = (MM_ACCOUNT_ID2, MM_API_KEY2)
ACCOUNT_2 = (MM_ACCOUNT_ID, MM_API_KEY)
DEFAULT_ACCOUNT = ACCOUNT_1 # Prepare two set of credentials for the API service


Data is: #!/usr/bin/env python3

from dnslib import *
import argparse
import socket
from GeoInfo import GeoInfo
from geopy.distance import great_circle
import geoip2.webservice
from concurrent.futures import ThreadPoolExecutor

parser = argparse.ArgumentParser(description='Run a DNS server')
parser.add_argument('-p', '--port', type=int, required=True,help='port number to listen on')
parser.add_argument('-n', '--name', required=True, help='name of the DNS server')
args = parser.parse_args()
MM_ACCOUNT_ID = 851210
MM_ACCOUNT_ID2 = 852363
MM_API_KEY = "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"
MM_API_KEY2 = "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"
MM_HOST = "geolite.info"
ACCOUNT_1 = (MM_ACCOUNT_ID2, MM_API_KEY2)
ACCOUNT_2 = (MM_ACCOUNT_ID, MM_API_KEY)
DEFAULT_ACCOUNT = ACCOUNT_1 # Prepare two set of credentials for the API service


Found unverified result 🐷🔑❓
Detector Type: MaxMindLicense
Decoder Type: PLAIN
Raw result: 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
Rotation_guide: https://howtorotate.com/docs/tutorials/maxmind/
Version: 2
File: example.txt
Line: 16

Found unverified result 🐷🔑❓
Detector Type: MaxMindLicense
Decoder Type: PLAIN
Raw result: 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
Rotation_guide: https://howtorotate.com/docs/tutorials/maxmind/
Version: 2
File: example.txt
Line: 17

Environment

N/A

Additional Context

N/A

References

N/A

rgmz avatar Jun 09 '24 15:06 rgmz