ec2instances.info
ec2instances.info copied to clipboard
Make instances.json smaller
The https://ec2instances.info/instances.json file is getting so many requests from automated systems that it's becoming too expensive for me to reasonably host. The file is already gzipped and being served from CloudFront.
Ideally people would respect the cache headers and only request it daily, as that's how often it is updated, but I don't have a way to contact the offenders.
As a result I'm planning to move the pricing data to other files, possibly split up by region. Specific implementation TBD depending on what seems right when I look at the code.
This issue serves as a short notice heads-up and place for followup discussion.
Can't you just reference to it from github? The URL you should be able to use is https://raw.githubusercontent.com/powdahound/ec2instances.info/master/www/instances.json
This is also backed by a CDN, apparently Fastly as I could see in the HTTP headers.
Great idea! Though the file in the repo isn't updated nightly like the site is (via Travis) and I'd like to keep the need for frequent repo updates to a minimum. Automating the repo update could be a solution to that...
Another way would be to get someone to host it using their CDN infrastructure and support these costs. How much is it monthly?
Once I know the cost I'll ask about this at our company, maybe we can host a CloudFront distribution for this.
~$150/mo and climbing. Your comment reminds me I should reach out to AWS directly and see if they'll support it. :) Let me try that first.
On Mon, May 7, 2018 at 12:14 AM Cristian Măgherușan-Stanciu @magheru_san < [email protected]> wrote:
Another say would be to get someone to host it using their CDN infrastructure and support these costs. How much is it monthly?
Once I know the cost I'll ask about this at our company, maybe we can host a CloudFront distribution for this.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/powdahound/ec2instances.info/issues/334#issuecomment-386979011, or mute the thread https://github.com/notifications/unsubscribe-auth/AAETSFHZCo9HRjeTIvxdb7PGITloXUwSks5tv_RdgaJpZM4TzuC4 .
Have you thought about putting it behind CloudFlare? The free plan would be plenty capable to cache that and the other assets and manual/API purges are free if you needed that. It should pretty effectively remove the load from your backend.
Plenty of CDNs are provide free hosting to open source, non-profit projects. However, I'm not sure how excited a CDN would be to enhance Amazon's service offering. It makes more sense for Amazon to pay for this.
Agreed. Still trying the AWS angle before considering other options. Would be nice to keep it contained on AWS tech.
@powdahound We (SmugMug & Flickr) just pinged AWS on your behalf, I suspect they'll throw you some credits. But we find it incredibly useful ourselves and would be happy to pay that bill to help out. Let me know!
@onethumb Appreciate it! Working with some of my contacts there too so we'll see what happens.
AWS got in touch and offered to cover 50% of the CloudFront cost. Unfortunately the size of the json files grew so much after some recent pricing bug fixes that even a 50% discount isn't really worthwhile. Total transfer on the instances.json in the last 30 days was over 3TB.
Given that I don't want to break whatever workflows people are using the json files for and I have extremely limited time to put into this project right now, I've set up redirects for the json files so that GitHub can serve them. Thanks Microsoft! And thanks @cristim for that idea.
Note that daily updates to the json data will no longer be available. Will try to keep the versions on GitHub from getting too stale.
Leaving this issue open because I think it still makes sense for these files to be smaller smaller so that they're more easily used. Thinking one main file for the instance details and then others with pricing data, probably split up by region. Not sure how people are using the data though.
Here's the exact redirect code if anyone is interested
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>instances.json</KeyPrefixEquals>
</Condition>
<Redirect>
<Protocol>https</Protocol>
<HostName>raw.githubusercontent.com</HostName>
<ReplaceKeyWith>powdahound/ec2instances.info/master/www/instances.json</ReplaceKeyWith>
</Redirect>
</RoutingRule>
<RoutingRule>
<Condition>
<KeyPrefixEquals>rds/instances.json</KeyPrefixEquals>
</Condition>
<Redirect>
<Protocol>https</Protocol>
<HostName>raw.githubusercontent.com</HostName>
<ReplaceKeyWith>powdahound/ec2instances.info/master/www/rds/instances.json</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>
Edit: Oops, the redirect isn't working as I expected. Working on it. Edit 2: Working now. 👍
You can make the JSON file 31% smaller by removing the whitespaces in the file which isn't needed for anything. It will however make the potential Git diffs illegible as it will all be on one, massively long line.
It could potentially be helped by having two JSON files, one which is minified and one which isn't, and then only serving the minified one but keeping the other one for readability.
These redirects to GitHub are in place again as the financial support from AWS ended and I haven't succeeded in getting it restarted.
The redirect rules are in JSON now as the "new" console doesn't support the old style XML config;
{
"Condition": {
"KeyPrefixEquals": "instances.json"
},
"Redirect": {
"HostName": "raw.githubusercontent.com",
"Protocol": "https",
"ReplaceKeyPrefixWith": "powdahound/ec2instances.info/master/www/instances.json"
}
},
{
"Condition": {
"KeyPrefixEquals": "rds/instances.json"
},
"Redirect": {
"HostName": "raw.githubusercontent.com",
"Protocol": "https",
"ReplaceKeyPrefixWith": "powdahound/ec2instances.info/master/www/rds/instances.json"
}
}
]```
@powdahound as I now work at AWS, I may be able to help with this hosting/credits issue. Let's talk on Gitter about this so we don't pollute this issue with such unrelated communication.
You can make the JSON file 31% smaller by removing the whitespaces in the file which isn't needed for anything. It will however make the potential Git diffs illegible as it will all be on one, massively long line.
Alternatively, simply reducing indentation should remove most of the whitespace while still preserving indentation formatting
Looked at this briefly when the above PR came in and noticed that the file is no longer being served gzipped. Should be a big win to turn that back on.
Tested with curl -H "Accept-Encoding: gzip" -I https://instances.vantage.sh/instances.json
This is turned back on as part of #670