cloud icon indicating copy to clipboard operation
cloud copied to clipboard

region sizes

Open PatMyron opened this issue 3 years ago • 10 comments

https://ip-ranges.amazonaws.com/ip-ranges.json

https://old.reddit.com/r/aws/comments/j3luvy/can_anyone_tell_me_or_send_me_documentation_on/g7dl4ip/

from collections import defaultdict
import requests
prefixes = requests.get('https://ip-ranges.amazonaws.com/ip-ranges.json').json()['prefixes']
regions = defaultdict(lambda: 0)
sum = 0
for prefix in prefixes:
  mask = prefix['ip_prefix'].split('/')[1]
  regions[prefix['region']] += 2**(32-int(mask))
  sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
us-east-1: 28%
us-west-2: 15%
eu-west-1: 09%
us-east-2: 07%
ap-northeast-1: 06%
eu-central-1: 06%
GLOBAL: 04%

total: 124 million

PatMyron avatar Jun 30 '21 03:06 PatMyron

https://www.gstatic.com/ipranges/cloud.json

from collections import defaultdict
import requests
prefixes = requests.get('https://www.gstatic.com/ipranges/cloud.json').json()['prefixes']
regions = defaultdict(lambda: 0)
sum = 0
for prefix in prefixes:
  try:
    mask = prefix['ipv4Prefix'].split('/')[1]
  except:
    pass
  regions[prefix['scope']] += 2**(32-int(mask))
  sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
us-central1: 25%
us-east1: 10%
europe-west1: 9%
asia-east1: 5%
us-west1: 5%
asia-northeast1: 4%
us-east4: 4%
global: 5%

total: 9 million

PatMyron avatar Sep 20 '21 01:09 PatMyron

https://www.microsoft.com/en-us/download/details.aspx?id=41653

https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20200824.xml

from collections import defaultdict
from xml.etree import ElementTree
import requests

regions = defaultdict(lambda: 0)
sum = 0
for region in ElementTree.fromstring(requests.get('https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20200824.xml').text):
  for cidr in region:
    mask = cidr.attrib['Subnet'].split('/')[1]
    regions[region.attrib['Name']] += 2**(32-int(mask))
    sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')
useast: 13%
europewest: 11%
uswest: 10%
useast2: 7%
europenorth: 7%
uscentral: 6%
ussouth: 6%
asiasoutheast: 4%
uswest2: 4%

total: 16 million

PatMyron avatar Sep 21 '21 03:09 PatMyron

thinking I should consider switching the Azure source:

https://twitter.com/0xdabbad00/status/1275821557785309184

https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20220523.json

https://www.microsoft.com/en-us/download/details.aspx?id=56519

Screen Shot 2022-05-30 at 8 19 55 AM
from collections import defaultdict
import requests
prefixes = requests.get('https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20220523.json').json()['values']
regions = defaultdict(lambda: 0)
sum = 0
for prefixList in prefixes:
  for prefix in prefixList['properties']['addressPrefixes']:
    mask = prefix.split('/')[1]
    try:
      regions[prefixList['name'].split('.')[1]] += 2**(32-int(mask))
      sum += 2**(32-int(mask))
    except:
      pass
    # sum += 2**(32-int(mask))
for region in regions:
  print(region + ": " + str(round(regions[region] / sum, 2)))
print('total:', sum//1000000, 'million')

PatMyron avatar May 30 '22 14:05 PatMyron

@0xdabbad00 I think most of the CIDRs are listed multiple times in that newer Azure IP range file, looks like they're listed by service and then listed again by service + region

Confirmed no duplicates in GCP's data, found some duplicates in AWS' data that need to be handled too, still some duplicates in Azure's CIDR ranges even after only counting the service + region half of the data

PatMyron avatar May 30 '22 15:05 PatMyron

assuming @seligman dealt with duplicates since his repos arrive at similar GCP/Azure totals after I handled most of the Azure duplicates, but @seligman has just over half as many AWS IP addresses as my unhandled AWS total

PatMyron avatar May 31 '22 22:05 PatMyron

Yep, there's a lot of overlap in AWS's ip-ranges, you need to account for it. For instance, both of these entries exist in ip-ranges.json, but they're clearly for the same exact IP addresses.

{"ip_prefix": "52.219.170.0/23", "region": "eu-central-1", "service": "AMAZON", "network_border_group": "eu-central-1"}
{"ip_prefix": "52.219.170.0/23", "region": "eu-central-1", "service": "S3", "network_border_group": "eu-central-1"}

There is also some overlap that's not quite as obvious, one larger example is:

{"ip_prefix": "35.180.0.0/16", "region": "eu-west-3", "service": "AMAZON", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.0.0/16", "region": "eu-west-3", "service": "EC2", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.16/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.24/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.32/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.40/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.48/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.56/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.1.8/29", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.112.128/27", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.112.160/27", "region": "eu-west-3", "service": "ROUTE53_RESOLVER", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.112.80/29", "region": "eu-west-3", "service": "EC2_INSTANCE_CONNECT", "network_border_group": "eu-west-3"}
{"ip_prefix": "35.180.244.0/23", "region": "eu-west-3", "service": "AMAZON", "network_border_group": "eu-west-3"}

All of the ranges in that list are either whole or in part included in the first range in the list.

To account for this, the code in my stuff uses netaddr:

from collections import defaultdict
import requests
from netaddr import IPSet, IPNetwork

prefixes = requests.get('https://ip-ranges.amazonaws.com/ip-ranges.json').json()['prefixes']
# Just output a few random demo regions
demo_regions = ["us-west-2", "us-east-1", "ap-southeast-1"]

def patmyron_method(prefixes):
    regions = defaultdict(lambda: 0)
    sum = 0
    for prefix in prefixes:
        mask = prefix['ip_prefix'].split('/')[1]
        regions[prefix['region']] += 2**(32-int(mask))
        sum += 2**(32-int(mask))
    for region in demo_regions:
        print(region + ": " + str(round(regions[region] / sum, 2)))
    print('total:', sum//1000000, 'million')

def seligman_method(prefixes):
    regions = defaultdict(list)
    for prefix in prefixes:
        cur_network = IPNetwork(prefix['ip_prefix'])
        regions[prefix['region']].append(cur_network)
        regions["_all_"].append(cur_network)
    all_ips_set = IPSet(regions["_all_"])
    for region in demo_regions:
        region_set = IPSet(regions[region])
        print(f"{region}: {len(region_set) / len(all_ips_set) : 0.2f}")
    print(f'total: {len(all_ips_set)//1000000} million')

for x in ["patmyron_method", "seligman_method"]:
    print(f"{'-'*10} {x} {'-'*50}")
    globals()[x](prefixes)

which outputs:

---------- patmyron_method --------------------------------------------------
us-west-2: 0.14
us-east-1: 0.26
ap-southeast-1: 0.03
total: 127 million
---------- seligman_method --------------------------------------------------
us-west-2:  0.15
us-east-1:  0.26
ap-southeast-1:  0.03
total: 66 million

Care must me taken when dealing with netaddr, since it can quickly turn into a O(N^2) problem if you're not careful, doubly so with IPv6 addresses, but if you prepare lists so you only go through IPSet() work a handful of times, it shouldn't be too bad.

I've gotten in the habit of doing this sort of logic for all of the cloud providers, though I think it's really only important for AWS. Truth be told, I'm not sure, it's safer to assume they're all a mess.

seligman avatar May 31 '22 23:05 seligman

Glad to see region sizes relative to each other still look pretty much the same, that's the main data I was trying to get and was hoping duplicates were roughly even between regions until I got around to fixing it

Still wild us-east-1 alone is right in between GCP and Azure in terms of number of IP addresses

PatMyron avatar Jun 01 '22 00:06 PatMyron

Yep, looked at in how much it impacts the final charts:

Change:  0.570%: 25.792% -> 26.362%: us-east-1
Change:  0.568%:  6.449% ->  5.881%: eu-central-1
Change:  0.383%:  5.049% ->  5.432%: GLOBAL
Change:  0.296%:  3.000% ->  2.704%: ap-south-1
Change:  0.243%: 14.640% -> 14.397%: us-west-2
Change:  0.218%:  8.500% ->  8.718%: eu-west-1
Change:  0.152%:  1.404% ->  1.252%: eu-west-3
Change:  0.147%:  6.760% ->  6.613%: us-east-2
Change:  0.143%:  1.603% ->  1.460%: ca-central-1
Change:  0.129%:  3.233% ->  3.361%: ap-southeast-1

Most of the regions are below 0.1% off, with a few outliers around .3 to .5%. Certainly good enough to convey the sizes.

( Same basic code tweaked to show changes )

seligman avatar Jun 01 '22 01:06 seligman

Just wanted to leave a quick thank you here (and here). Your repo has given me the idea to also calculate the IP addresses for Google regions. You can see the result here: https://gcloud-compute.com/regions.html

Cyclenerd avatar Jul 20 '22 21:07 Cyclenerd

@Cyclenerd https://github.com/GoogleCloudPlatform/region-picker/issues/10 would be another great addition, but GCP is the only one of the 3 I've never been able to automate scraping that information for: https://github.com/PatMyron/cloud#product--feature-regional-availability

PatMyron avatar Jul 21 '22 00:07 PatMyron