2ms
2ms copied to clipboard
History - Group by Secret
Today when a secret is found in several versions/ history, it returns different results (all under the same ID) - we would like to group these results and add the data of the versions where the secret has been found
The added data is:
- count of the occurrences in the history (how many versions are included)
- First and last version (version identifier and date)
Technical Details
Generally work with versions
Today we extract the content from the source and move each content forward to the detector engine. To consider the history, which means that multiple contents are related, there are two options that I can see now:
- Change our attitude and consider a document with all its history versions are a connected block to scan.
- Revise the results, where different versions are combined under the same result ID, and extract the versions info from there.
I think we should choose the 1st option, from the engineering perspective, to declare and control the versions (and their order) and not send them to the other side of the process and consider them as a source of truth.
The --history
flag
All the discussion above is relevant when the --history
is enabled. What should we do when the --history
is omitted?
Plugins
Assume we will now work with a bunch of versions of the same document, it will be challenging for some plugins.
Git
With Git, we are not reading, using, and scanning the whole version, but we take only the diff. How should we treat a deleting line or deleting file version? How will the Detector know the secret was removed?
Action Items
- [ ] I will start with Confluence and with changing the process to scan a bunch of versions.