trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Scan the contents of SQLite DB file

Open bugbaba opened this issue 1 year ago • 9 comments

Hello Team,

Description

Scan contents of the SQLite DB file for secrets inside it.

Preferred Solution

The SQLite DB files can be detected using the determineMimeType function at https://github.com/trufflesecurity/trufflehog/blob/main/pkg/handlers/archive.go#L416 and then dump the contents of the DB into a text file which can be scanned for keys inside it.

So that case like below could be detected image

Additional Context

Contents can be dump using the below command sqlite3 db_filename .dump > tmp_dumpfile

Discussion on how to do the same in golang https://github.com/mattn/go-sqlite3/issues/535

-- Regards, @bugbaba

bugbaba avatar Nov 04 '23 06:11 bugbaba

@bugbaba thanks for opening this issue!

Scanning sqlite dbs is a really great idea. Ideally we would have a decoder be able to detect when a filesystem scan encountered a sqlite db. Adding new decoders is pretty easy. We're happy to accept PRs for adding a sqlite decoder.

zricethezav avatar Nov 06 '23 22:11 zricethezav

@bugbaba what is the output if you run strings db_filename? Is it substantially different from sqlite3 db_filename .dump > tmp_dumpfile?

I've seen results from several SQLite databases, so I'm curious what the difference would be.

rgmz avatar Nov 14 '23 00:11 rgmz

With strings random characters get added along with the contents of the DB

For example, the dump file for this test db is

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "user_info" (
	"id"	INTEGER,
	"username"	TEXT,
	"api_key"	TEXT,
	PRIMARY KEY("id")
);
INSERT INTO user_info VALUES(1,'abc','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');
INSERT INTO user_info VALUES(2,'xyz','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');
COMMIT;

Whereas when we run strings on the file we get

SQLite format 3
Otableuser_infouser_info
CREATE TABLE "user_info" (
	"id"	INTEGER,
	"username"	TEXT,
	"api_key"	TEXT,
	PRIMARY KEY("id")
Gxyz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e
Gabc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

Now to the naked eye, the API key is surely visible in this output too. But as we can see random character G is added at the start of the line and e is added at the end first row entry.

So for example the github key detector regex trufflehog is using won't be able to detect it due to use of boundaries \b in the regex.

We can see the same in the screenshot from regex101.com image

bugbaba avatar Nov 14 '23 04:11 bugbaba

Small correction, this would be implemented as a handler. Handlers can work on whole files, decoders might only be processing at a small chunk of a file.

https://github.com/trufflesecurity/trufflehog/blob/main/pkg/handlers/handlers.go

Unpacking sqlite db files with a handler would be a welcome addition.

dustin-decker avatar Nov 20 '23 22:11 dustin-decker

Now to the naked eye, the API key is surely visible in this output too. But as we can see random character G is added at the start of the line and e is added at the end first row entry.

I just discovered a GitHub access token inside a binary file and it reminded me of this issue. Random junk is surely a problem that causes false negatives with other binary files.

I wonder if it would be feasible to create "fuzzy" patterns for binary files to try and pick up what would otherwise be high-confidence patterns. For example, \bghp_[a-fA-F0-9]{32}\b for text chunks but just ghp_[a-fA-F0-9]{32} for binary chunks, which could be ruled in or out with validation. (Obviously this wouldn't be a silver bullet and might cause other problems.)

rgmz avatar Nov 24 '23 01:11 rgmz

This can be done for a nongeneric regex that has a unique prefix like ghp_ and a specific length. But for others, it's surely going to increase the number of false positives.

bugbaba avatar Nov 24 '23 05:11 bugbaba

I think when the handler unpacks the SQLi data, it should include the column name nearby to help with detection. This is because for some detectors that require keywords close by to help with detection if they don't have fixed prefix.

dxa4481 avatar Nov 24 '23 18:11 dxa4481

I think when the handler unpacks the SQLi data, it should include the column name nearby to help with detection.

Based on some cursory testing, we may be able to create a dump using .mode that fulfills this. I have only tested this with individual selects, so I can't say whether there's an easy way to convert that into a dump.

sqlite> .mode json customers
sqlite> select * from customers;
[{"CustomerId":1,"FirstName":"John","LastName":"Doe","Company":"Business Enterprises","Address":"123 Fake St","City":"Springfield","State":"CO","Country":"USA","PostalCode":"12227-000","Phone":"1234567890","Fax":null,"Email":"[email protected]","SupportRepId":3},
...

The line and insert modes might also work as it seems to specify the column.

Edit: unfortunately, .dump doesn't seem to respect .mode and will always default to INSERT INTO .... Leveraging .mode might require parsing all views/tables and doing select statements "by hand".

rgmz avatar Dec 10 '23 23:12 rgmz