mantis icon indicating copy to clipboard operation
mantis copied to clipboard

Mantis skips discovery phase for TLDs that are reserved for government entities

Open 0xbharath opened this issue 1 year ago • 5 comments

Describe the bug Mantis seems to skip discovery phase for TLDs reserved for country/govt entities.

To Reproduce

mantis onboard -o gov:my -t gov.my

[2024-09-04 15:36:36,773] --> INFO: MANTIS Workflow - STARTED
[2024-09-04 15:36:36,773] --> INFO: Executing workname workflowName='default' schedule='daily between 00:00 and 04:00' cmd=[] scanNewOnly=False workflowConfig=[Module(moduleName='discovery', tools=['Subfinder', 'Amass'], order=1), Module(moduleName='prerecon', tools=['FindCDN', 'Naabu'], order=2), Module(moduleName='activehostscan', tools=['HTTPX_Active', 'HTTPX'], order=3), Module(moduleName='activerecon', tools=['Wafw00f'], order=4), Module(moduleName='scan', tools=['DNSTwister', 'Nuclei', 'Corsy'], order=5), Module(moduleName='secretscanner', tools=['SecretScanner'], order=6)]
[2024-09-04 15:36:36,793] --> INFO: Inserting user input into database

0it [00:00, ?it/s]

PRERECON: 100%|

ACTIVEHOSTSCAN: 100%|

System (please complete the following information):

Docker based setup on Ubuntu 24.04.

Additional context

This seems to happen due to the library that is used to categorize the input provided.

0xbharath avatar Sep 04 '24 15:09 0xbharath

The issue seems to be in the usage of tldextract library in the file mantis/utils/asset_type.py .

>>> tldextract.extract("example.com").registered_domain
'example.com'
>>> tldextract.extract("nic.in").registered_domain
''

tldextract uses the public suffix list for parsing TLDs https://publicsuffix.org/list/public_suffix_list.dat

0xbharath avatar Sep 04 '24 16:09 0xbharath

shouldn't this issue be fixed at the source?

dmdhrumilmistry avatar Oct 03 '24 15:10 dmdhrumilmistry

Ideally, yes. It would be tricky to get the library to impart this changes. We are trying to see if we can find a workaround or use a different library to fix this issue.

0xbharath avatar Oct 05 '24 07:10 0xbharath

After thinking about it, I don't think there's something wrong with the library. nic.in is supposed to be used as TLD. so if you're using library to extract registered domain from string consisting only TLD then it should return empty string.

>>> import tldextract
# querying str with TLD only
>>> tldextract.extract("com").registered_domain
''
>>> tldextract.extract("nic.in").registered_domain
''

# querying str with labels + tld
>>> tldextract.extract("example.com").registered_domain
'example.com'
>>> tldextract.extract("subdomain.example.com").registered_domain
'example.com'
>>> tldextract.extract("example.nic.in").registered_domain # works since it has label + TLD
'example.nic.in'
>>> tldextract.extract("subdomain.example.nic.in").registered_domain
'example.nic.in'

dmdhrumilmistry avatar Oct 05 '24 07:10 dmdhrumilmistry

@0xbharath can you provide an real world scenario example, I'll take a look into this

dmdhrumilmistry avatar Oct 21 '24 07:10 dmdhrumilmistry