combine icon indicating copy to clipboard operation
combine copied to clipboard

Handling of "orphan" indicators

Open alexcpsec opened this issue 10 years ago • 10 comments

Today, indicators that for some reason do not match our "IPv4" or "FQDN" validation just stay there without a type. An example:

$ cat harvest.csv | grep -v FQDN | grep -v IPv4
"entity","type","direction","source","notes","date"
"2001:41d0:8:dcd4::1","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2002:5f18:8f82::5f18:8f82","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2002:c3d3:9a9f::c3d3:9a9f","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a00:1210:fffe:145::1","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a00:1210:fffe:72::1","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a01:238:20a:202:1000::25","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a01:540:2:bd5d:d849:1e69:7736:be41","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:140:3:a90f:3bd1:d8d9:3485","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:140:3:b86c:62e8:3e0e:a0fb","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:2380:0:501b:91a5:76ff:8fa8","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2a03:7380:2380:0:95db:5adb:685d:a0f0","","inbound","http://www.blocklist.de/lists/apache.txt","","2014-09-04"
"2001:41d0:1:c9b2::1","","inbound","http://www.blocklist.de/lists/bots.txt","","2014-09-04"
"2a01:430:17:1::ffff:376","","inbound","http://www.blocklist.de/lists/bots.txt","","2014-09-04"
"Export","","inbound","http://virbl.org/download/virbl.dnsbl.bit.nl.txt","","2014-09-04"
"ckaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","","outbound","http://www.nothink.org/blacklist/blacklist_malware_dns.txt","","2014-09-04"

We are not interested (for now) on IPv6 and the other stuff seem like parsing errors.

I believe we should filter out the indicators that do not match an specific type.

alexcpsec avatar Sep 04 '14 22:09 alexcpsec

IPv6, definitely we can just tag and ignore for now.

The Export indicator from http://virbl.org/download/virbl.dnsbl.bit.nl.txt is actually a bug.

Interestingly, http://www.nothink.org/blacklist/blacklist_malware_dns.txt actually does list ckaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, which we can filter out obviously but it's interesting that they let some bad data through.

krmaxwell avatar Sep 04 '14 22:09 krmaxwell

I was thinking of just filtering out everything that the IPv4 and FQDN stuff do not recognize.

alexcpsec avatar Sep 04 '14 22:09 alexcpsec

For the bad data, sure, we just filter it out. IPv6 is something we should add as a future enhancement because that's eventually going to be relevant, particularly as a research question.

krmaxwell avatar Sep 04 '14 22:09 krmaxwell

Sure, but then it becomes a handler here when we are ready :) : https://github.com/mlsecproject/combine/blob/master/thresher.py#L9-L19

alexcpsec avatar Sep 04 '14 22:09 alexcpsec

Exactly. We add the proper regex now in thresher, but winnower can filter it out (more specifically, only pass types it knows about). Or maybe just have IPv6 output as an option in combine.cfg?

krmaxwell avatar Sep 04 '14 22:09 krmaxwell

Well, if you have a good regex for IPv6 validation, we could just add that right away.

I think the "right" answer is for combine.cfg have a "list of indicator types I want outputted" in the winnower section, which defaults at ("IPv4", "FQDN"). Ideally you should be able to override that (or select a few others only) from the command line.

alexcpsec avatar Sep 04 '14 22:09 alexcpsec

I think that's the right way to go. And I'll just use something from http://stackoverflow.com/questions/53497/regular-expression-that-matches-valid-ipv6-addresses ;)

krmaxwell avatar Sep 04 '14 23:09 krmaxwell

Curious - does anybody have a use case for consuming IPv6 indicators right now? I see a lot more of these in the feeds, though I haven't investigated them yet.

krmaxwell avatar Jan 14 '15 20:01 krmaxwell

I'd just drop them for now. That was my original suggestion.

alexcpsec avatar Jan 14 '15 20:01 alexcpsec

That is in fact what we do. Just thinking about when we should start doing something with them.

krmaxwell avatar Jan 14 '15 20:01 krmaxwell