tartufo icon indicating copy to clipboard operation
tartufo copied to clipboard

tartufo ignores Python "pickle" content that can contain secrets

Open pmevzek-godaddy opened this issue 3 years ago • 0 comments

It is just an observation, so neither a bug nor a feature request, but maybe it has merit to address.

In short: some given Python content that is stored encoded by the pickle module is not picked up for secrets, as the below shows.

$ git init test
$ cd test
$ git commit --allow-empty --allow-empty-message -m 'Start'
[main (root-commit) 94b6886] Start
$ cat secret.txt
Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O
$ cat secret.py
a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'
$ python -c "import pickle; a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'; fh = open('secret.pickle0', 'wb'); pickle.dump(a, fh, 0)"
$ python -c "import pickle; a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'; fh = open('secret.pickle1', 'wb'); pickle.dump(a, fh, 1)"
$ python -c "import pickle; a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'; fh = open('secret.pickle2', 'wb'); pickle.dump(a, fh, 2)"
$ python -c "import pickle; a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'; fh = open('secret.pickle3', 'wb'); pickle.dump(a, fh, 3)"
$ python -c "import pickle; a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'; fh = open('secret.pickle4', 'wb'); pickle.dump(a, fh, 4)"
$ python -c "import pickle; a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'; fh = open('secret.pickle5', 'wb'); pickle.dump(a, fh, 5)"
$ hexdump -C secret.pickle0
00000000  56 5a 32 59 4a 56 6a 36  79 4f 71 42 5a 2f 44 33  |VZ2YJVj6yOqBZ/D3|
00000010  61 59 4c 63 75 72 7a 54  45 63 63 62 34 55 51 41  |aYLcurzTEccb4UQA|
00000020  63 71 50 76 77 57 4a 37  4f 0a 70 30 0a 2e        |cqPvwWJ7O.p0..|
0000002e
$ hexdump -C secret.pickle1
00000000  58 28 00 00 00 5a 32 59  4a 56 6a 36 79 4f 71 42  |X(...Z2YJVj6yOqB|
00000010  5a 2f 44 33 61 59 4c 63  75 72 7a 54 45 63 63 62  |Z/D3aYLcurzTEccb|
00000020  34 55 51 41 63 71 50 76  77 57 4a 37 4f 71 00 2e  |4UQAcqPvwWJ7Oq..|
00000030
$ hexdump -C secret.pickle2
00000000  80 02 58 28 00 00 00 5a  32 59 4a 56 6a 36 79 4f  |..X(...Z2YJVj6yO|
00000010  71 42 5a 2f 44 33 61 59  4c 63 75 72 7a 54 45 63  |qBZ/D3aYLcurzTEc|
00000020  63 62 34 55 51 41 63 71  50 76 77 57 4a 37 4f 71  |cb4UQAcqPvwWJ7Oq|
00000030  00 2e                                             |..|
00000032
$ hexdump -C secret.pickle3
00000000  80 03 58 28 00 00 00 5a  32 59 4a 56 6a 36 79 4f  |..X(...Z2YJVj6yO|
00000010  71 42 5a 2f 44 33 61 59  4c 63 75 72 7a 54 45 63  |qBZ/D3aYLcurzTEc|
00000020  63 62 34 55 51 41 63 71  50 76 77 57 4a 37 4f 71  |cb4UQAcqPvwWJ7Oq|
00000030  00 2e                                             |..|
00000032
$ hexdump -C secret.pickle4
00000000  80 04 95 2c 00 00 00 00  00 00 00 8c 28 5a 32 59  |...,........(Z2Y|
00000010  4a 56 6a 36 79 4f 71 42  5a 2f 44 33 61 59 4c 63  |JVj6yOqBZ/D3aYLc|
00000020  75 72 7a 54 45 63 63 62  34 55 51 41 63 71 50 76  |urzTEccb4UQAcqPv|
00000030  77 57 4a 37 4f 94 2e                              |wWJ7O..|
00000037
$ hexdump -C secret.pickle5
00000000  80 05 95 2c 00 00 00 00  00 00 00 8c 28 5a 32 59  |...,........(Z2Y|
00000010  4a 56 6a 36 79 4f 71 42  5a 2f 44 33 61 59 4c 63  |JVj6yOqBZ/D3aYLc|
00000020  75 72 7a 54 45 63 63 62  34 55 51 41 63 71 50 76  |urzTEccb4UQAcqPv|
00000030  77 57 4a 37 4f 94 2e                              |wWJ7O..|
00000037
$ git add *
$ git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   secret.pickle0
	new file:   secret.pickle1
	new file:   secret.pickle2
	new file:   secret.pickle3
	new file:   secret.pickle4
	new file:   secret.pickle5
	new file:   secret.py
	new file:   secret.txt
$ tartufo pre-commit
~~~~~~~~~~~~~~~~~~~~~
Reason: High Entropy
Filepath: secret.pickle0
Signature: d7b1bd8586fe6413f2db104e6f21a0973a29c8824f1853577865c08f39edaff5
@@ -0,0 +1,3 @@
+VZ2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O
+p0
+.
\ No newline at end of file

~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
Reason: High Entropy
Filepath: secret.py
Signature: bd5f1cb3840ff61899f6e0243d79184b615f9f3b5f578b08b68955ea4a23add3
@@ -0,0 +1 @@
+a = 'Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O'

~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
Reason: High Entropy
Filepath: secret.txt
Signature: 3ce497a4bb0cc2d088750d6d4fb27a90736e720f3e2c8cf6d0c459ebec76ca64
@@ -0,0 +1 @@
+Z2YJVj6yOqBZ/D3aYLcurzTEccb4UQAcqPvwWJ7O

~~~~~~~~~~~~~~~~~~~~~

TL;DR: A secret saved in Python code, or text file, or pickle version 0 (which is pure ASCII) is found by tartufo. But all other pickle versions, where the content is binary but the secret string still clearly seen in file as is, are not picked up by tartufo.

Why I am writing this? I discovered it in pure luck, for some RDAP unit tests I maintain, I store the expected output (which is a Python object linked to other Python objects, and I want to keep the details so I can't use JSON here in a trivial fashion) using pickle and I saw that the various high entropy strings (that were just public data here as RDAP is public but uses ROIDs which are opaque identifiers that can be considered high entropy) are not picked up by tartufo.

pmevzek-godaddy avatar Mar 24 '21 00:03 pmevzek-godaddy