rules icon indicating copy to clipboard operation
rules copied to clipboard

[BUG] domain Rule Matches Too Broadly

Open LoneOdDaeth opened this issue 1 year ago • 0 comments

Describe the bug

The domain rule in the YARA ruleset matches unintended strings that are not actual domains. This leads to false positives when scanning files that contain generic words, filenames, or localhost-like addresses.

To Reproduce

Steps to reproduce the behavior:

Run YARA scan with the domain rule enabled.

Scan a file that contains common words, filenames, or IP addresses.

Observe that many non-domain strings are detected.

Example false positives:

test-123 file.txt localhost random_text

All these strings are incorrectly flagged as domains.

Expected behavior

The domain rule should only match valid domains, such as example.com, sub.example.net, or test-site.org. It should not match:

Plain text words

Filenames like file.txt

Localhost or internal references

Additional context

The issue is caused by the overly broad regex pattern:

$domain_regex = /([\w.-]+)/ wide ascii

This matches any word that includes dots, hyphens, or alphanumeric characters, leading to many false positives.

Suggested Fix: Update the regex to a stricter pattern that ensures a valid TLD is present:

$domain_regex = /([a-zA-Z0-9-]+.[a-zA-Z]{2,6})/ wide ascii

This ensures only real domains are detected.

LoneOdDaeth avatar Feb 20 '25 13:02 LoneOdDaeth