Excavate tries to de-duplicate yara matches based on description
Issue: https://github.com/blacklanternsecurity/bbot/issues/1937
This MR fixes a bug where the excavate module will try to de-duplicate findings when a yara FINDINGs are emitted.
Example:
rule find_string {
strings:
$str1 = "Example String"
condition:
$str1
}
This will generate a FINDING emit with the following:
[FINDING] {"description": "Custom Yara Rule [find_string] Matched via identifier [str1]", "host": "example.com", "path": "/", "url": "https://example.com/"} httpx->excavate
However, if another site matches on the same string, it will not generate a FINDING emit, instead will suppress the emit because the description doesn't have enough uniqueness on the match.
The MR fixes this by adding specific URL where the match was found. Example below:
[FINDING] {"description": "Custom Yara Rule [find_string] Matched via identifier [str1] on https://example.com/", "host": "example.com", "path": "/", "url": "https://example.com/"} httpx->excavate
This creates enough uniqueness where the FINDING wont' be suppressed and won't fire if it finds duplicates on the same string, on the same site.
Codecov Report
Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
Project coverage is 93%. Comparing base (
65ac448) to head (6b32b0b). Report is 41 commits behind head on dev.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| bbot/modules/internal/excavate.py | 0% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## dev #1938 +/- ##
=====================================
+ Coverage 93% 93% +1%
=====================================
Files 361 361
Lines 27773 27774 +1
=====================================
+ Hits 25588 25591 +3
+ Misses 2185 2183 -2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
It seems like this would cause an error when processing events where .data isn't a dictionary, like RAW_TEXT.
Also before we merge this we need to understand why FINDINGs with different hosts are being deduped. If that is the behavior we're seeing, then that's definitely strange and probably the result of a deeper bug.
@aconite33 I will work on tracking this down. Do you have an exact yara rule + bbot command that can reproduce the bug?
Superceded by https://github.com/blacklanternsecurity/bbot/pull/1969; thanks @aconite33 for noticing this one