Reduce noise with more reliability
-
generate and periodically update IP list extracted from https://raw.githubusercontent.com/stamparm/maltrail/master/trails/static/mass_scanner.txt and filter those IP addresses out. I know that we already check the reputation from the T-POTs but we cannot depend on external configuration and this should be managed directly from GB for more reliability.
-
Use Reverse DNS queries as a filter to remove noise after the previous filter. Keep an internal DB of records if we think that doing DNS queries for every new addition is too expensive. We should have a static list of known domains that we should filter from.
First step has been implemented here: https://github.com/intelowlproject/GreedyBear/pull/541
At the first extraction of those mass scanners, the number of filtered IPs went from ~240 to more than 2.5k out of ~800k IP.
@regulartim The mass scanners filter is now working properly :)
I also noticed that there are some IP addresses that are not listed there but that still belong to other legit services like modat.io or groupib.com. I wouldn't like to add some ReverseDNS queries to the collection routine to avoid the risk of making it slower/fails + I am not sure whether it could conflict with the machine learning training routine.
So, for now, I am trying to collect the other scanners that are not categorized and to push its list to the original source where the T-POTs and Greedybear are collecting from. See this first PR: https://github.com/stamparm/maltrail/pull/19326.
If you ever see some other IP addresses from your T-POTs deployments that are related to known service, we could incrementally add them there. I don't expect that a lot of them are missing but still this could help filtering the noise.
About that, in the "likely to recur list", I found basically all the onyphe.net IP addresses (http://dixon.probe.onyphe.net/ip-ranges.txt), I am filtering them too.
I also found others from Censys and Shodan. Trying to filter them out too.
UPDATE: I got 2 PRs already merged there !!! :)
UPDATE: I got 2 PRs already merged there !!! :)
Cool! :)
About that, in the "likely to recur list", I found basically all the onyphe.net IP addresses (http://dixon.probe.onyphe.net/ip-ranges.txt), I am filtering them too.
I think that every IP address which is in a front position in the "likely to recur" list has a quite high probability of being a mass scanner. So based on that, with a little bit of work, I think we can contribute lots if IP addresses to the maltrail list.