core icon indicating copy to clipboard operation
core copied to clipboard

[Feature Request] Integrating IPinfo's free IP to Country ASN database

Open abdullahdevrel opened this issue 1 year ago • 11 comments

Update: https://github.com/opnsense/core/issues/7779#issuecomment-2945306686

We are offering the network/CIDR based IP dataset.


Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
  • [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Is your feature request related to a problem? Please describe.

Using the IPinfo IP to Country ASN or IP to Country database will address several problems with the current IP geolocation implementation:

  • It enables users to have an alternative (if not replacement) source of data than the current free. https://github.com/opnsense/core/issues/6264
  • IPinfo's country-level database provides full accuracy and is updated daily. The current database intentionally provides a compromised accuracy database as they have a paid version of the country database.
  • It packages both IPv4 and IPv6 in a single database, eliminating the need to use multiple databases. https://github.com/opnsense/core/blob/5dd269f7e1bef0b637bcb9c45a0338288da7268f/src/opnsense/scripts/filter/download_geoip.py#L38-L39
  • The ASN/ISP data provided is useful and can be sourced from this single database.
  • The project can also have its own project-specific download access token, eliminating the need for users to bring their own access token and reducing setup time.

Describe the solution you like

I am requesting to add support for IPinfo's IP to Country database to the project. The database has the following features:

  • It includes country and ASN information in the same database.
  • It is updated daily, with zero compromise to accuracy. There is no range clustering, and the database provides full accuracy.
  • The data granularity reaches individual IP level.
  • The database comes in MMDB database format.
  • It is licensed under CC-BY-SA 4.0, permitting commercial usage.
  • Available file formats include: CSV, MMDB, JSON
  • The data is tabular and unnested, making it very easy to use. The dataset includes both IPv4 and IPv6 in a single file.

Database schema

Field Name Example Data Type Description
start_ip 1.0.16.0 TEXT Starting IP address of an IP address range
end_ip 1.0.31.255 TEXT Ending IP address of an IP address range
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country
continent AS TEXT Continent code of the country
continent_name Asia TEXT Name of the continent
asn AS2519 TEXT Autonomous System Number
as_name ARTERIA Networks Corporation TEXT Name of the AS (Autonomous System) organization
as_domain arteria-net.com TEXT Official domain or website of the AS organization

Documentation: https://ipinfo.io/developers/ip-to-country-asn-database

Samples are available here: https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country%20ASN

The database can be downloaded simply by accessing the storage URI with an access token.

curl -L https://ipinfo.io/data/free/country_asn.mmdb?token=<YOUR_TOKEN> -o country_asn.mmdb

Describe alternatives you considered

A clear and concise description of any alternative solutions or features you considered.

I have not considered an alternative.

Additional context

The business version of OpnSense includes a paid version of the GeoIP country database. However, even though IPinfo's IP to Country database is free, it is the best country-level data available out there because the data source itself is based on latency and networking data-based methodology instead of self-reported locations of ASNs/ISPs.

https://ipinfo.io/accuracy

Additionally, there is no range clustering or delayed updates with IPinfo. IPinfo does not have an accuracy-compromised free country or city database. This database can be considered for the business variant of the software as well and license is permissive to commercial usage.

abdullahdevrel avatar Aug 16 '24 19:08 abdullahdevrel

A Reddit post by redspidr demonstrated the idea of introducing IPinfo's data into Opnsense. Their project converts IP addresses to links to their IPinfo page links, which provide detailed metadata on those IP addresses.

Inspired by it, I have requested the Reddit Opnsense community to review this ticket and recommend bringing our data to Opnsense with the integration of our free IP database first. The current issue does not explore the project demonstrated by the Reddit community user redspidr but a native under-the-hood integration using our database.

A couple of issues were raised by one of the Reddit community users regarding this request, so I am pasting my answers here.

Inclusion of your data rather than the existing provider in the business version?

The free IP database that we have is the best possible variant and is equal if not better than the paid version of the country database the business version of OpnSense is currently using. Consequently, the database is certainly better than the existing free version of the IP database the community version the project uses.

This poses a challenge: how does the Opnsense community benefit most from which action?

  1. Option 1 (My preference): Replacement of both the free version database from the community version of the project as well as the paid version of the database from the business version with the single IPinfo free IP to Country ASN or IP to Country database. This not only will provide a unified experience to community and business users with better data, but will also reduce business costs associated with licensing the paid database. Our database is licensed under CC-BY-SA 4.0, which is a commercial permissive license and allows distribution. We do not have complex EULA agreements, you can easily use the database. This option is my preference.
  2. Option 2: Replacing the existing database with our free database from the community version only. The community version will have better data than the business version if the project maintainers accept this proposal. The database is designed to support open source projects primarily.
  3. Option 3: Adding additional support to the database to the existing database. This will support bringing your own database from us and the existing provider, giving users the option to choose.

Documentation on how to add your data to the free version in parallel to the existing docs?

Considering the previous topic, I am not sure what option would be considered by the project maintainers. If they want to replace their existing database provider they can do that or they can integrate our database in parallel to the existing database.

In terms of documentation, there are slight modifications involved.

  1. Both are MMDB databases, so parsing the database should not be an issue.
  2. The existing database uses country-level data which we also provide here. However, there are differences in the database schema. We use a flat and tabular data structure while the existing database uses a nested database provider.
  3. Update frequency. Our database is updated every day, so we appreciate the database to be updated frequently. The existing database is not updated daily.
  4. Packaging decision. Because our database has more lenient licensing terms, community version users do not have to download their own database or bring their own access token. The project can use their own project-specific access token.

Here is a blog post: https://ipinfo.io/blog/migrating-from-maxmind-to-ipinfo


Please let me know if you have further questions. Thank you very much.

abdullahdevrel avatar Aug 17 '24 06:08 abdullahdevrel

For us as a core team this isn't a priority, given our business edition does contain simple to use geo aliases out of the box including a documented file format to use for the community version (https://docs.opnsense.org/manual/aliases.html#geoip).

I do understand that your company will prefer your product above one of its competitors, but there's also some marketing involved in claims being made.

Personally I don't have a strong preference for a geoip vendor, but when it's a commercial discussion, our community GitHub might not be the best place.

If someone does want to do the work, and the amount of required guidance is limited, we will assess in the usual way.

AdSchellevis avatar Aug 17 '24 07:08 AdSchellevis

@AdSchellevis

Thank you for reviewing the request. I sincerely appreciate you taking the time to review the issue.

This was not a commercial request, nor am I trying to sell the Opnsense community a comercial service. I advocated bringing highly accurate data designed for open-source projects in mind. The free database is licensed under CC-BY-SA 4.0

I do understand that your company will prefer your product over one of its competitors, but there's also some marketing involved in the claims being made.

I understand the skepticism involved. However, in terms of accuracy, we can provide verifiable information to back up our claim, even for a free database. If you are interested in verifying our claims for accuracy, please let me know. I can walk you through a self-evaluation process that ensures you and the community personally verifying this information.

Personally, I don't have a strong preference for a geoip vendor, but when it's a commercial discussion, our community GitHub might not be the best place.

No, I am not making any form of commercial discussion at all. The proposal was the integration of a free database. There was absolutely no hint of a commercial service. My apologies if I have indicated otherwise. I have tried my best to understand the issues and motivation for selecting the geoip database, and I have seen a ticket where you have mentioned that the software offers the paid IO database through the business version.

However, my proposal was to replace even the paid version of the IP database with a free IP database that we can demonstrate can provide better accuracy.

If someone does want to do the work, and the amount of required guidance is limited, we will assess it in the usual way.

Thank you for considering the issue.


My apologies if I was unclear in saying this is not a commercial service. We built this free IP to Country database primarily to support open source projects. I understand Opnsense is a massive project and the changes required to adopt it may be significant. I can assure you that we can demonstrate the value of adopting the free database to the community and the project's customers.

abdullahdevrel avatar Aug 17 '24 08:08 abdullahdevrel

Would love to see alternatives to Maxmind as well, unfortunately, definitely do not have the time to do the coding at the moment.

Now, there would be a super-easy and fast way to get this available in OPNsense @abdullahdevrel - "simply" provide the data in the CSV format documented and required for OPNsense. 😉

doktornotor avatar Aug 17 '24 09:08 doktornotor

Thanks @doktornotor. I really appreciate that you reviewed the request. This is a significant request, and I understand that it will require engineering commitment to support it. We will let our OPNsense users know that this request is being considered.

Now, there would be a super-easy and fast way to get this available in OPNsense @abdullahdevrel - "simply" provide the data in the CSV format documented and required for OPNsense. 😉

I hope my pitch makes sense when we said our database is simple to use. The current implementation requires 3 CSV files.

image

While we have all this information in a single file in our IP to Country database:

start_ip end_ip country country_name continent continent_name
2 2620:0:1cff:dead:bef1:100:1:1aa 2620:0:1cff:dead:bef1:100:1:1b0 SG Singapore AS
3 212.221.79.153 212.221.79.171 DE Germany EU

https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country

If anyone wants to use our data for now, they will have to make modifications to the database on their end.

abdullahdevrel avatar Aug 17 '24 09:08 abdullahdevrel

First of all, I am NOT an OPNsense developer, merely a random code contributor.

If anyone wants to use our data for now, they will have to make modifications to the database on their end.

Well yes, that is the problem. I have been merely hinting the fastest way to get your GeoIP data used in OPNsense - without any coding being required on the OPNsense part (paste in an URL pointing to ZIP file with the required CSV files, done.)

Using a single CSV file might even be easier and faster to process - if someone does the coding, however that's not a drop-in replacement. Not having the IP ranges in CIDR format being one of the examples why the current code won't work and non-trivial amount of coding is required to support this single-file format.

doktornotor avatar Aug 17 '24 10:08 doktornotor

Got it. Thank you.

I am not sure why the project does not use the MMDB file format, which is designed for fast and efficient lookups.

Not having the IP ranges in CIDR format being one of the examples why the current code won't work and non-trivial amount of coding is required to support this single-file format.

We have a tool for that called range2cidr (which also part of our CLI), that can generate the CIDR/range column.

cidr country country_name continent continent_name
1.0.0.0/25 AU Australia OC Oceania
1.0.0.128/26 AU Australia OC Oceania

The issue is that the time to generate the CIDR is a bit slow

time ipinfo range2cidr country.csv > country_cidr.csv

real    9m58.852s
user    0m22.040s
sys     0m44.654s

abdullahdevrel avatar Aug 17 '24 11:08 abdullahdevrel

Yeah, a bit. 😉 Same issue with Python, PHP or whatever other code used for the purpose on similar projects. Now, assuming similar HW specs, multiply the wasted CPU time by the user base and one update per day.

As for why MMDB format is not used - the thing is - you are not doing realtime lookups. You simply parse the data once every while and use that parsed data in firewall rules to reject/accept connections. There's no integration for lookups in databases present in the pf firewall code, plus the performance would not exactly rock either I guess.

doktornotor avatar Aug 17 '24 14:08 doktornotor

I will try to think of a solution. The challenge is that we probably won't be able to produce the CIDR variant of the free IP database for download because we would then have to account for maintenance of another variant of the same product.

When MM switched from their legacy geoip to a more modern variant, it broke a lot of things. We aimed to remain stable from day 0 onward to avoid such situations. Introducing a CIDR variant of the database could increase the load on us as we would have to maintain it virtually indefinitely.

abdullahdevrel avatar Aug 17 '24 17:08 abdullahdevrel

I second that this would be awesome!

enoch85 avatar Feb 05 '25 17:02 enoch85

Update: IPinfo Lite

We now have a CIDR/Network based IP to Country ASN database: IPinfo Lite

🔗 https://ipinfo.io/developers/ipinfo-lite-database

Field Name Example Data Type Descrption
network 154.24.39.204/30 TEXT CIDR/IP range or single IP address
country Canada TEXT Country name
country_code CA TEXT Two-letter ISO 3166 country code of the IP addresses
continent North America TEXT Continent name of the IP location
continent_code NA TEXT Two-letter continent code of the IP location
asn AS174 TEXT Autonomous System Number, an organization that owns the IP range block
as_name Cogent Communications TEXT Name of the AS (Autonomous System Number) organization
as_domain cogentco.com TEXT Official domain or website of the ASN organization

Our database, despite being free and having a great OSS-friendly license, is the best IP to Country ASN database available.

We are not making any compromises with our data; it is fully accurate. It would be truly an honor for us if the project considers our data. It is the same data that comes from running 1,000 servers across 130 countries.

A firewall software should have native local databases. However, if downloading and maintenance are issues, we have also developed a standalone API system based on this data to support an infinite amount of requests. Unlimited requests are supported and it is free as well.

https://ipinfo.io/developers/responses#lite-api

There is no EULA, there are no restrictions on distribution, there is no limitation on commercial intent, there is no compromise, and there is no downside. We are trying our best to be as OSS friendly, but regardless, it is an enterprise-ready, mission-critical database currently being used by several major F500 companies in production.

abdullahdevrel avatar Jun 05 '25 17:06 abdullahdevrel

Seems cool having more options

sopex avatar Jul 21 '25 07:07 sopex

Thank you very much, @sopex. It will not be just more options. It is the most accurate data out there right now. Once you integrate our data into the software, regardless of whether the data is free, you will have a direct line to us for asking questions or providing feedback on IP geolocation data. We are super active in all the communities that use our data.

abdullahdevrel avatar Jul 21 '25 16:07 abdullahdevrel

Came across this thread after having issues with Sophos using MaxMind, giving us inaccurate results. I went to block Lithuania due to a large influx of malicious traffic from there, only to find out MaxMind thought the IP was from the USA... IPInfo had no such issue.

paz avatar Jul 30 '25 02:07 paz

@paz That is unfortunate. I just want to mention that our data is the best there is, and if you ever have any doubts about its quality, feel free to reach out to me. We'll gladly investigate the issue and provide evidence to support our reporting.

abdullahdevrel avatar Jul 30 '25 04:07 abdullahdevrel

In case someone wants to try https://github.com/opnsense/core/commit/2d91f22b48d0b72ca8978636e05df0ce7b0b1a1d, install the update on 25.7.1 using:

opnsense-patch 2d91f22

Next input the IPinfo location with an account registered there ( https://ipinfo.io/data/ipinfo_lite.csv.gz?token=XXXXXXX) into the url field and hit apply.

In case you need to manually flush the existing geoip data:

rm /usr/local/share/GeoIP/alias.stats 

AdSchellevis avatar Aug 05 '25 19:08 AdSchellevis

In case someone wants to try 2d91f22, install the update on 25.7.1 using:

opnsense-patch 2d91f22

Next input the IPinfo location with an account registered there ( https://ipinfo.io/data/ipinfo_lite.csv.gz?token=XXXXXXX) into the url field and hit apply.

In case you need to manually flush the existing geoip data:

rm /usr/local/share/GeoIP/alias.stats 

Seems to work great, thank you!! Also, the https://api.ipinfo.io/lite/8.8.8.8?token=XXXXXXXX that the IPinfo GUI suggests seems to work too

sopex avatar Aug 05 '25 19:08 sopex

@AdSchellevis Thank you very much for supporting our database. We are excited to see Opnsense as a software and community adopting our data. To the Opnsense community, you are not only receiving the data but also a commitment from us to provide excellent data, infrastructure, and active participation in the community.

  • IPinfo Community: https://community.ipinfo.io/
  • Opnsense community: https://forum.opnsense.org/index.php?action=profile;area=forumprofile;u=41997
  • Reddit: https://www.reddit.com/r/opnsense/

If you have any questions about our data, please feel free to ask on your community platforms, and I will be there to answer you. Cheers!

abdullahdevrel avatar Aug 06 '25 01:08 abdullahdevrel

@abdullahdevrel thank you for offering it, thirst thing that stood out while working on this was the size of the database, it's quite large and at a first glance seems to be more fine grained than what we used before.

I'm keeping this ticket open for a while, documentation still needs an update, but the code seems to be more or less finished.

AdSchellevis avatar Aug 06 '25 07:08 AdSchellevis

We provide full accuracy and zero compromise data for IPinfo Lite, meaning that the granularity of IP address accuracy extends to individual IP addresses. So, the size is quite large.

abdullahdevrel avatar Aug 06 '25 14:08 abdullahdevrel

Can also confirm, this works great! Thank you @AdSchellevis and @abdullahdevrel!

enoch85 avatar Aug 06 '25 17:08 enoch85