[feature] Set Up WHOIS Data Retrieval and Storage for Devices
-
In order to fetch the required WHOIS details, we need to first setup a Maxmind account. This gives us access to free databases and keys required for web service.
There are two ways to fetch the data using geoip2 :
- Through web service (requires internet for each fetch)
- Through manually downloading the database (requires scripts for updating)
We need to finalize which approach we are opting for.
-
After finalizing the approach, we can move forward with creating the WHOIS model with the required fields. These are the fields which can be fetched:
class AbstractWHOISInfo(TimeStampedEditableModel):
"""
Abstract model to store WHOIS information
for a device.
"""
device = models.OneToOneField(
get_model_name('config', 'Device'),
on_delete=models.CASCADE,
related_name='whois_info',
help_text=_('Device to which this WHOIS info belongs'),
)
last_public_ip = models.GenericIPAddressField(
db_index=True,
help_text=_(
'indicates the IP address logged from '
'the last request coming from the device'
),
)
organization_name = models.CharField(
max_length=200,
blank=True,
help_text=_('Organization name'),
)
country = models.CharField(
max_length=4,
blank=True,
help_text=_('Country Code'),
)
asn = models.CharField(
max_length=100,
blank=True,
help_text=_('Autonomous System Number'),
)
timezone = models.CharField(
max_length=100,
blank=True,
help_text=_('Time zone'),
)
address = models.CharField(
max_length=255,
blank=True,
help_text=_('Address'),
)
cidr = models.CharField(
max_length=20,
blank=True,
help_text=_('CIDR'),
)
class Meta:
abstract = True
- Moving ahead with the celery task implementation, i am planning on using core geoip2 to get the required details like Organization name, ASN, CIDR. Though django provides a wrapper for geoip2, it provides limited fields. Sample Implementation of the celery task:
def fetch_whois_details(device_pk, ip):
"""
Fetches the WHOIS details of the given IP address
and creates/updates the device's WHOIS information.
Also creates/updates the device's fuzzy location.
"""
WHOISInfo = load_model('config', 'WHOISInfo')
Location = load_model('geo', 'Location')
DeviceLocation = load_model('geo', 'DeviceLocation')
Device = load_model('config', 'Device')
try:
# 'geolite.info' is available for free
ip_client = geoip2.webservice.Client(
settings.GEOIP_ACCOUNT_ID, settings.GEOIP_LICENSE_KEY, 'geolite.info'
)
device = Device.objects.get(pk=device_pk)
data = ip_client.city(ip)
# Format address using the data from the geoip2 response
address = ', '.join(
[
data.city.name,
data.country.name,
data.continent.name,
str(data.postal.code),
]
)
# Create/update the WHOIS information for the device
WHOISInfo.objects.update_or_create(
device_id=device_pk,
defaults={
'organization_name': data.traits.autonomous_system_organization,
'asn': data.traits.autonomous_system_number,
'country': data.country.name,
'timezone': data.location.time_zone,
'address': address,
'cidr': data.traits.network,
'last_public_ip': ip,
},
)
The celery task will run when last_ip field changes. We can use _changed_checked_fields to track the changes like this:
def trigger_whois_lookup(self):
"""Trigger WHOIS lookup if the last IP has changed and is public IP."""
from ipaddress import ip_address
from .. import tasks
if self._initial_last_ip == models.DEFERRED:
return
# Trigger fetch WHOIS lookup if it does not exist
# or if the last IP has changed and is a public IP
if (
not hasattr(self, 'whoisinfo') or self.last_ip != self._initial_last_ip
) and ip_address(self.last_ip).is_global:
tasks.fetch_whois_details.delay(self.pk, self.last_ip)
self._initial_last_ip = self.last_ip
- We also need to setup an independent celery task which updates the database periodically ensuring latest detail on each lookup. The frequency of this task is based on the provider we are choosing.
There are two ways to fetch the data using geoip2 :
- Through web service (requires internet for each fetch)
- Through manually downloading the database (requires scripts for updating)
Let me know your thoughts!
- Through manually downloading the database (requires scripts for updating)
I believe Django also advices to download the data. How frequently we may need to update the scripts?
- Through manually downloading the database (requires scripts for updating) I believe Django also advices to download the data. How frequently we may need to update the scripts?
Maxmind updates the databases twice a week, every friday and tuesday. DB-IP databases are updated monthly
NOTE:
- DB-IP databases do not require an account creation, while Maxmind databases does
- Updates are a bit simpler on Maxmind since they do it via ACCESS_KEY.
- Both offer respective libraries for automatic updates via ACCESS_KEY which requires account creation, but DB-IP has 'paid' account creation.
There are two ways to fetch the data using geoip2 :
- Through web service (requires internet for each fetch)
- Through manually downloading the database (requires scripts for updating)
Let me know your thoughts!
I’m also in favor of proceeding by downloading the database, with the added flexibility to configure and switch the database source when needed from app settings.
More information regarding services provided by Maxmind:
Webservices
- GeoLite2 Web services are free but are Limited to 1000 requests per day
- GeoIP2 Web services are paid and starts with 20$ (10,000 queries amounting to 0.002$ per query)
- Both these services do not require any additional licensing
More info can be found here: https://www.maxmind.com/en/geoip-api-web-services#buy-now https://geoip2.readthedocs.io/en/latest/#sync-web-service-example
Maxmind's webservice repository : https://github.com/maxmind/GeoIP2-python
Databases
- GeoLite2 Databases are free to download but requires commercial licensing. The total size can vary between 50 - 70 mb.
- GeoIP2 Databases are paid with starting price of 134$ (we require city database as the fields we require are present in city db only)
NOTE : GeoLite2 databases are limited to 30 downloads per day More References: https://dev.maxmind.com/geoip/docs/databases/city-and-country/#database-sizes https://www.maxmind.com/en/site-license-overview https://www.maxmind.com/en/geoip-databases
Account Creation
Following link can be used for account creation : https://www.maxmind.com/en/geolite2/signup?utm_source=kb&utm_medium=kb-link&utm_campaign=kb-create-account
After successful creation,
- user can go to
Manage License Keysor usehttps://www.maxmind.com/en/accounts/<<account_id>>/license-keyto create a license which can be used for web services, or - go to download databases to download the GeoLite2 City and ASN database.
More information regarding services provided by DB-IP:
Databases
Free Databases are updated monthly with total size of 100 - 130mb.
More on database pricing: https://db-ip.com/db/ip-to-location-isp
Webservice
Requires Paid account creation with prices starting at 11.49 Euros (~13$) for Core API which has the fields required by us.
More info can be found here: https://db-ip.com/api/
Hi @DragnEmperor As discussed in the call today, although databases is a preferred solution, for now, we will proceed with the APIs as they will be less time taking to implement.
You can create a separate task for database integration which we can look into once our deliverables are done
Hi @DragnEmperor As discussed in the call today, although databases is a preferred solution, for now, we will proceed with the APIs as they will be less time taking to implement.
You can create a separate task for database integration which we can look into once our deliverables are done
Thanks @DragnEmperor for the info, I agree with Kapil to create a separate issue for this and copy that valuable information you have shared there to the issue description.
Hi @DragnEmperor As discussed in the call today, although databases is a preferred solution, for now, we will proceed with the APIs as they will be less time taking to implement.
You can create a separate task for database integration which we can look into once our deliverables are done
Hi @DragnEmperor As discussed in the call today, although databases is a preferred solution, for now, we will proceed with the APIs as they will be less time taking to implement. You can create a separate task for database integration which we can look into once our deliverables are done
Thanks @DragnEmperor for the info, I agree with Kapil to create a separate issue for this and copy that valuable information you have shared there to the issue description.
Have created an issue for this : https://github.com/openwisp/openwisp-controller/issues/1052 As discussed, keeping it separate from the main GSoC project. We can pick this later as per requirement and demand.