recon-ng icon indicating copy to clipboard operation
recon-ng copied to clipboard

Ignoring domains?

Open 0x646e78 opened this issue 5 years ago • 8 comments

I'm wondering if there's a mechanism to blacklist the adding of domains and hosts to the db?

For example I don't want google.com or any of their subdomains or hosts added, but these are often picked up in various domain recon activities.

If not, happy to help get this in if it seems useful to others.

0x646e78 avatar May 07 '20 15:05 0x646e78

I'm wondering if soemthing like a 'scope' db could be introduced, which could set scoping bounds on the recon tasks. Something like the following

value db type
%gmail.com contacts blacklist
example.com domains whitelist

but then also, do we add a table column, or wrap these up into an abstraction, or just do away with the db column and just have the table colum, that actually might be best.... I'll think it through a bit more and try some things out there.

This could be hooked into the insert function in core/framework.py here

I think it'd be useful to do a LIKE, thus allowing say %mx%google.com in the value field for domains, but this could also get confusing for people and perhaps introduce a bug or two along the way.

Keen on any thoughts about this @lanmaster53 , I'll work on it if it sounds worthwhile.

0x646e78 avatar May 17 '20 03:05 0x646e78

I've made a first attempt on a branch here: https://github.com/0x646e78/recon-ng/tree/scoping_table

So far matching regex, looking something like this:

[recon-ng][sc][certificate_transparency] > show scope

  +-----------------------------------------------------------------------+
  | rowid |       value       | column |   action  | notes |    module    |
  +-----------------------------------------------------------------------+
  | 1     | .*mx.*google\.com | host   | blacklist |       | user_defined |
  | 2     | .*googlemail\.com | host   | blacklist |       | user_defined |
  +-----------------------------------------------------------------------+

[*] 2 rows returned

I'll open a WIP PR once I'm a but further a long, any suggestions would be great.

0x646e78 avatar May 23 '20 12:05 0x646e78

For some inspiration how this is handled in other projects, feel free to have a look at the autonoscope feature in sn0int: https://sn0int.readthedocs.io/en/stable/autonoscope.html

We have a hierarchical system that allows blacklist/whitelist rules for domains, ips and urls. We're basically doing this with "tree"-style matching. This allows setting up layered rules like:

  • default is accept
  • ignore everything . [blacklist]
  • except if it's .com [whitelist]
  • except if it's example.com [blacklist]
  • except if it's a.b.c.d.example.com [whitelist]

The most specific matching rule wins. To avoid having to exclude all kinds of special characters we don't support wildcards though. It also avoids the problem that .*googlemail\.com would match notgooglemail.com. I think there are advantages/disadvantages in both of them, just wanted to share some other approaches.

kpcyrd avatar May 23 '20 15:05 kpcyrd

@kpcyrd that's a pretty nice approach, will certainly take inpspiration from it. sn0int looks good too, rust based is cool, will take it for a spin. I realised last night after making this comment that I'd left the literal dot from the regex's too, hence that googlemail match ;)

0x646e78 avatar May 24 '20 00:05 0x646e78

We've been kicking around the idea of a validation system for all harvested data as well (https://github.com/lanmaster53/recon-ng/issues/34). So, for instance, any time Recon-ng tries to write harvested data to the ip_address column of the hosts table, it will validate that it is actually an IP address. Modules return some unexpected stuff when resources change, etc. and can make a real mess of the databse. The reason I mention this, is because this system would tie in closely with that one. Something to think about. Regardless, I'd like to add both of these capabilities.

lanmaster53 avatar Jun 08 '20 03:06 lanmaster53

Ahhh cool. I was thinking of that sort of thing too. Good to know. I've progressed down the regex path for domains, I appreciate the sn0int approach but I also really like the flexibility of regex matches, and the options afforded that way.

0x646e78 avatar Jun 09 '20 06:06 0x646e78

Feel free to hop in the slack and collaborate with us on a solution. There's at least one other person that I believe was actively working on a solution. I had worked on some code as well, but I'm just so busy at the moment. Perhaps I'll drop my stuff in a new branch and everyone can start working on that. Thoughts? Interested?

lanmaster53 avatar Jun 09 '20 14:06 lanmaster53

Well the last few months have been tumultuous for me, but have a bit of breathing space to look at this again now.

I've been running using scoping functionality I built in May, and it's been really useful: https://github.com/0x646e78/recon-ng/blob/6b2659762567838889510b92c82e2256ccb9990d/recon/core/framework.py#L666

I'll bring up a discussion in slack in coming days to see if I can get something together that'll work for people.

0x646e78 avatar Oct 09 '20 12:10 0x646e78