ec2instances.info icon indicating copy to clipboard operation
ec2instances.info copied to clipboard

Make instances.json smaller

Open powdahound opened this issue 7 years ago • 15 comments

The https://ec2instances.info/instances.json file is getting so many requests from automated systems that it's becoming too expensive for me to reasonably host. The file is already gzipped and being served from CloudFront.

Ideally people would respect the cache headers and only request it daily, as that's how often it is updated, but I don't have a way to contact the offenders.

As a result I'm planning to move the pricing data to other files, possibly split up by region. Specific implementation TBD depending on what seems right when I look at the code.

This issue serves as a short notice heads-up and place for followup discussion.

powdahound avatar May 05 '18 17:05 powdahound

Can't you just reference to it from github? The URL you should be able to use is https://raw.githubusercontent.com/powdahound/ec2instances.info/master/www/instances.json

This is also backed by a CDN, apparently Fastly as I could see in the HTTP headers.

cristim avatar May 05 '18 19:05 cristim

Great idea! Though the file in the repo isn't updated nightly like the site is (via Travis) and I'd like to keep the need for frequent repo updates to a minimum. Automating the repo update could be a solution to that...

powdahound avatar May 06 '18 15:05 powdahound

Another way would be to get someone to host it using their CDN infrastructure and support these costs. How much is it monthly?

Once I know the cost I'll ask about this at our company, maybe we can host a CloudFront distribution for this.

cristim avatar May 07 '18 07:05 cristim

~$150/mo and climbing. Your comment reminds me I should reach out to AWS directly and see if they'll support it. :) Let me try that first.

On Mon, May 7, 2018 at 12:14 AM Cristian Măgherușan-Stanciu @magheru_san < [email protected]> wrote:

Another say would be to get someone to host it using their CDN infrastructure and support these costs. How much is it monthly?

Once I know the cost I'll ask about this at our company, maybe we can host a CloudFront distribution for this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/powdahound/ec2instances.info/issues/334#issuecomment-386979011, or mute the thread https://github.com/notifications/unsubscribe-auth/AAETSFHZCo9HRjeTIvxdb7PGITloXUwSks5tv_RdgaJpZM4TzuC4 .

powdahound avatar May 08 '18 00:05 powdahound

Have you thought about putting it behind CloudFlare? The free plan would be plenty capable to cache that and the other assets and manual/API purges are free if you needed that. It should pretty effectively remove the load from your backend.

darkorb avatar May 12 '18 06:05 darkorb

Plenty of CDNs are provide free hosting to open source, non-profit projects. However, I'm not sure how excited a CDN would be to enhance Amazon's service offering. It makes more sense for Amazon to pay for this.

briangithex avatar May 18 '18 15:05 briangithex

Agreed. Still trying the AWS angle before considering other options. Would be nice to keep it contained on AWS tech.

powdahound avatar May 18 '18 16:05 powdahound

@powdahound We (SmugMug & Flickr) just pinged AWS on your behalf, I suspect they'll throw you some credits. But we find it incredibly useful ourselves and would be happy to pay that bill to help out. Let me know!

onethumb avatar May 19 '18 00:05 onethumb

@onethumb Appreciate it! Working with some of my contacts there too so we'll see what happens.

powdahound avatar May 22 '18 03:05 powdahound

AWS got in touch and offered to cover 50% of the CloudFront cost. Unfortunately the size of the json files grew so much after some recent pricing bug fixes that even a 50% discount isn't really worthwhile. Total transfer on the instances.json in the last 30 days was over 3TB.

Given that I don't want to break whatever workflows people are using the json files for and I have extremely limited time to put into this project right now, I've set up redirects for the json files so that GitHub can serve them. Thanks Microsoft! And thanks @cristim for that idea.

Note that daily updates to the json data will no longer be available. Will try to keep the versions on GitHub from getting too stale.

Leaving this issue open because I think it still makes sense for these files to be smaller smaller so that they're more easily used. Thinking one main file for the instance details and then others with pricing data, probably split up by region. Not sure how people are using the data though.

Here's the exact redirect code if anyone is interested

<RoutingRules>
  <RoutingRule>
    <Condition>
      <KeyPrefixEquals>instances.json</KeyPrefixEquals>
    </Condition>
    <Redirect>
      <Protocol>https</Protocol>
      <HostName>raw.githubusercontent.com</HostName>
      <ReplaceKeyWith>powdahound/ec2instances.info/master/www/instances.json</ReplaceKeyWith>
    </Redirect>
  </RoutingRule>
  <RoutingRule>
    <Condition>
      <KeyPrefixEquals>rds/instances.json</KeyPrefixEquals>
    </Condition>
    <Redirect>
      <Protocol>https</Protocol>
      <HostName>raw.githubusercontent.com</HostName>
      <ReplaceKeyWith>powdahound/ec2instances.info/master/www/rds/instances.json</ReplaceKeyWith>
    </Redirect>
  </RoutingRule>
</RoutingRules>

Edit: Oops, the redirect isn't working as I expected. Working on it. Edit 2: Working now. 👍

powdahound avatar Jun 04 '18 16:06 powdahound

You can make the JSON file 31% smaller by removing the whitespaces in the file which isn't needed for anything. It will however make the potential Git diffs illegible as it will all be on one, massively long line.

It could potentially be helped by having two JSON files, one which is minified and one which isn't, and then only serving the minified one but keeping the other one for readability.

Tenzer avatar Oct 02 '18 16:10 Tenzer

These redirects to GitHub are in place again as the financial support from AWS ended and I haven't succeeded in getting it restarted.

The redirect rules are in JSON now as the "new" console doesn't support the old style XML config;

    {
        "Condition": {
            "KeyPrefixEquals": "instances.json"
        },
        "Redirect": {
            "HostName": "raw.githubusercontent.com",
            "Protocol": "https",
            "ReplaceKeyPrefixWith": "powdahound/ec2instances.info/master/www/instances.json"
        }
    },
    {
        "Condition": {
            "KeyPrefixEquals": "rds/instances.json"
        },
        "Redirect": {
            "HostName": "raw.githubusercontent.com",
            "Protocol": "https",
            "ReplaceKeyPrefixWith": "powdahound/ec2instances.info/master/www/rds/instances.json"
        }
    }
]```

powdahound avatar Jan 04 '21 15:01 powdahound

@powdahound as I now work at AWS, I may be able to help with this hosting/credits issue. Let's talk on Gitter about this so we don't pollute this issue with such unrelated communication.

cristim avatar Jan 04 '21 20:01 cristim

You can make the JSON file 31% smaller by removing the whitespaces in the file which isn't needed for anything. It will however make the potential Git diffs illegible as it will all be on one, massively long line.

Alternatively, simply reducing indentation should remove most of the whitespace while still preserving indentation formatting

PatMyron avatar May 28 '22 04:05 PatMyron

Looked at this briefly when the above PR came in and noticed that the file is no longer being served gzipped. Should be a big win to turn that back on.

Tested with curl -H "Accept-Encoding: gzip" -I https://instances.vantage.sh/instances.json

powdahound avatar May 28 '22 17:05 powdahound

This is turned back on as part of #670

EverettBerry avatar Nov 18 '22 19:11 EverettBerry