Ignoring domains?
I'm wondering if there's a mechanism to blacklist the adding of domains and hosts to the db?
For example I don't want google.com or any of their subdomains or hosts added, but these are often picked up in various domain recon activities.
If not, happy to help get this in if it seems useful to others.
I'm wondering if soemthing like a 'scope' db could be introduced, which could set scoping bounds on the recon tasks. Something like the following
| value | db | type |
|---|---|---|
| %gmail.com | contacts | blacklist |
| example.com | domains | whitelist |
but then also, do we add a table column, or wrap these up into an abstraction, or just do away with the db column and just have the table colum, that actually might be best.... I'll think it through a bit more and try some things out there.
This could be hooked into the insert function in core/framework.py here
I think it'd be useful to do a LIKE, thus allowing say %mx%google.com in the value field for domains, but this could also get confusing for people and perhaps introduce a bug or two along the way.
Keen on any thoughts about this @lanmaster53 , I'll work on it if it sounds worthwhile.
I've made a first attempt on a branch here: https://github.com/0x646e78/recon-ng/tree/scoping_table
So far matching regex, looking something like this:
[recon-ng][sc][certificate_transparency] > show scope
+-----------------------------------------------------------------------+
| rowid | value | column | action | notes | module |
+-----------------------------------------------------------------------+
| 1 | .*mx.*google\.com | host | blacklist | | user_defined |
| 2 | .*googlemail\.com | host | blacklist | | user_defined |
+-----------------------------------------------------------------------+
[*] 2 rows returned
I'll open a WIP PR once I'm a but further a long, any suggestions would be great.
For some inspiration how this is handled in other projects, feel free to have a look at the autonoscope feature in sn0int: https://sn0int.readthedocs.io/en/stable/autonoscope.html
We have a hierarchical system that allows blacklist/whitelist rules for domains, ips and urls. We're basically doing this with "tree"-style matching. This allows setting up layered rules like:
- default is accept
- ignore everything
.[blacklist] - except if it's
.com[whitelist] - except if it's
example.com[blacklist] - except if it's
a.b.c.d.example.com[whitelist]
The most specific matching rule wins. To avoid having to exclude all kinds of special characters we don't support wildcards though. It also avoids the problem that .*googlemail\.com would match notgooglemail.com. I think there are advantages/disadvantages in both of them, just wanted to share some other approaches.
@kpcyrd that's a pretty nice approach, will certainly take inpspiration from it. sn0int looks good too, rust based is cool, will take it for a spin. I realised last night after making this comment that I'd left the literal dot from the regex's too, hence that googlemail match ;)
We've been kicking around the idea of a validation system for all harvested data as well (https://github.com/lanmaster53/recon-ng/issues/34). So, for instance, any time Recon-ng tries to write harvested data to the ip_address column of the hosts table, it will validate that it is actually an IP address. Modules return some unexpected stuff when resources change, etc. and can make a real mess of the databse. The reason I mention this, is because this system would tie in closely with that one. Something to think about. Regardless, I'd like to add both of these capabilities.
Ahhh cool. I was thinking of that sort of thing too. Good to know. I've progressed down the regex path for domains, I appreciate the sn0int approach but I also really like the flexibility of regex matches, and the options afforded that way.
Feel free to hop in the slack and collaborate with us on a solution. There's at least one other person that I believe was actively working on a solution. I had worked on some code as well, but I'm just so busy at the moment. Perhaps I'll drop my stuff in a new branch and everyone can start working on that. Thoughts? Interested?
Well the last few months have been tumultuous for me, but have a bit of breathing space to look at this again now.
I've been running using scoping functionality I built in May, and it's been really useful: https://github.com/0x646e78/recon-ng/blob/6b2659762567838889510b92c82e2256ccb9990d/recon/core/framework.py#L666
I'll bring up a discussion in slack in coming days to see if I can get something together that'll work for people.