internet-yellow-pages icon indicating copy to clipboard operation
internet-yellow-pages copied to clipboard

Better formatting checks for IP, prefixes, and countries

Open romain-fontugne opened this issue 1 year ago • 7 comments
trafficstars

Currently we make sure that all IPs and prefixes are in lower case. And also that country codes are in upper case. We still need the crawlers to take care of IPv6 compression and we hope they give well-formed IP, prefixes, and country codes but unexpected things can still happen.

https://github.com/InternetHealthReport/internet-yellow-pages/blob/98f970100ec30ea61a0eaff8258629c3af0b279c/iyp/init.py#L15-L23

Why not adding the checks in the iyp library too? For example:

prop_formatters = { 
     # asn is stored as an int 
     'asn': int, 
     # ipv6 is stored in lowercase 
     'ip': lambda s: ipaddress.ip_address(s).compressed, 
     'prefix': lambda s: ipaddress.ip_network(s).compressed, 
     'country_code': lambda s: iso3166.countries.get(s).alpha2
 }

Yes, it is redundant, the crawlers still have to do that too, but at least we ensure that no weird node are created. @m-appel did we have any good reason for not adding these checks in iyp/init.py?

romain-fontugne avatar Jan 17 '24 04:01 romain-fontugne

Nope, this is actually just this issue #87, which we have not implemented yet :) The fix is fast, but needs some careful checking if affected crawlers still work correctly.

m-appel avatar Jan 17 '24 04:01 m-appel

Thanks, I'll try a full run with the code mentioned above and see if things break.

romain-fontugne avatar Jan 17 '24 05:01 romain-fontugne

ok, for the records, that broke quite a few crawlers. This is the log: iyp-2024-01-17.log

romain-fontugne avatar Jan 22 '24 03:01 romain-fontugne

Good thing we tested it :) For the ones that break due to the IP address/network, I think they should be fixed (i.e., probably ignore the invalid addresses/prefixes), but I think forcing an ISO3166 country code might be a bit much? I know ZZ for example is technically a correct "Reserved" country code...

m-appel avatar Jan 22 '24 03:01 m-appel

Agreed, we should tolerate also the exotic country codes (EU, AP). But no weird IP should go through these checks.

romain-fontugne avatar Jan 22 '24 06:01 romain-fontugne

I would love to work on this issue.

Forchapeatl avatar Feb 27 '24 20:02 Forchapeatl

How would you fix that issue?

On Wed, Feb 28, 2024, 05:40 FORCHA PEARL @.***> wrote:

I would love to work on this issue.

— Reply to this email directly, view it on GitHub https://github.com/InternetHealthReport/internet-yellow-pages/issues/112#issuecomment-1967552424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKPNXNQ7DGEEUY6RQL3EWDYVZADXAVCNFSM6AAAAABB544OPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRXGU2TENBSGQ . You are receiving this because you authored the thread.Message ID: @.*** com>

romain-fontugne avatar Feb 28 '24 09:02 romain-fontugne