dask-ec2 icon indicating copy to clipboard operation
dask-ec2 copied to clipboard

Consider cheaper EC2 instance type default value

Open deeplook opened this issue 9 years ago • 5 comments

Given that it is quite easy to forget about destroying an existing cluster (see #35) I would consider it important to set an EC2 instance type as a default value, that is much cheaper than m3.2xlarge. Otherwise one can get unexpected significant increases in one's AWS invoice, like it happened to me, because of experimenting with dask-ec2. In my case this was ca. 96 CPU hours and $60 per day! :-(

This (and other values) should be listed in the generated cluster.yml file, too. So there is some evidence without needing to go to the AWS online console.

deeplook avatar Nov 03 '16 09:11 deeplook

Oof, sorry to hear about the unexpected bill.

There is a trade-off here with having default nodes that are still of a relevant size for computing. An m.2xlarge is about the size of a modern laptop, so it's nice to have a few of them to get a sense of scale. I can't think of any full solutions, but there are probably some things that could help:

  1. Visibly publish an expected hourly or daily cost. This requires having a copy of Amazon's costs-per-node within the project, which will inevitably go stale or out of date, but is still probably handy.
  2. Is it possible to launch an EC2 node with a time-to-live?
  3. Is it possible to add an ability to query if the cluster in the cluster.yaml file is still active?

Other thoughts?

mrocklin avatar Nov 03 '16 13:11 mrocklin

@mrocklin Ad 1: Sadly, it seems like AWS APIs were providing pricing info only for spot price history. I've found this online page for on-demand pricing (which probably applies here): https://aws.amazon.com/ec2/pricing/on-demand/ Others can be found here: https://aws.amazon.com/ec2/pricing/

I've tried scraping it and got it eventually mostly working only when using Selenium (not Pandas, Lxml or BeautifulSoup; maybe there's too much JS in it). Now Selenium is likely not a great dependency to have for dask-ec2 (especially in combination with PhantomJS for a headless browser). But then, one would not expect this pricing info to change a lot. So, maybe hardcoding such a current pricing table (ignoring data transfer pricing) with every build of dask-ec2 might be an idea. Since this is different for every AWS region one would need a list of them... I get something like this for Linux instances in us-east-1 right now:

type vCPU ECU Memory (GiB) Instance Storage (GB) Linux/UNIX Usage
t2.nano 1 Variable 0.5 EBS Only $0.0065 per Hour
t2.micro 1 Variable 1 EBS Only $0.013 per Hour
t2.small 1 Variable 2 EBS Only $0.026 per Hour
t2.medium 2 Variable 4 EBS Only $0.052 per Hour
t2.large 2 Variable 8 EBS Only $0.104 per Hour
m4.large 2 6.5 8 EBS Only $0.12 per Hour
m4.xlarge 4 13 16 EBS Only $0.239 per Hour
m4.2xlarge 8 26 32 EBS Only $0.479 per Hour
m4.4xlarge 16 53.5 64 EBS Only $0.958 per Hour
m4.10xlarge 40 124.5 160 EBS Only $2.394 per Hour
m4.16xlarge 64 188 256 EBS Only $3.83 per Hour
m3.medium 1 3 3.75 1 x 4 SSD $0.067 per Hour
m3.large 2 6.5 7.5 1 x 32 SSD $0.133 per Hour
m3.xlarge 4 13 15 2 x 40 SSD $0.266 per Hour
m3.2xlarge 8 26 30 2 x 80 SSD $0.532 per Hour
c4.large 2 8 3.75 EBS Only $0.105 per Hour
c4.xlarge 4 16 7.5 EBS Only $0.209 per Hour
c4.2xlarge 8 31 15 EBS Only $0.419 per Hour
c4.4xlarge 16 62 30 EBS Only $0.838 per Hour
c4.8xlarge 36 132 60 EBS Only $1.675 per Hour
c3.large 2 7 3.75 2 x 16 SSD $0.105 per Hour
c3.xlarge 4 14 7.5 2 x 40 SSD $0.21 per Hour
c3.2xlarge 8 28 15 2 x 80 SSD $0.42 per Hour
c3.4xlarge 16 55 30 2 x 160 SSD $0.84 per Hour
c3.8xlarge 32 108 60 2 x 320 SSD $1.68 per Hour
p2.xlarge 4 12 61 EBS Only $0.9 per Hour
p2.8xlarge 32 94 488 EBS Only $7.2 per Hour
p2.16xlarge 64 188 732 EBS Only $14.4 per Hour
g2.2xlarge 8 26 15 60 SSD $0.65 per Hour
g2.8xlarge 32 104 60 2 x 120 SSD $2.6 per Hour
x1.16xlarge 64 174.5 976 1 x 1920 SSD $6.669 per Hour
x1.32xlarge 128 349 1952 2 x 1920 SSD $13.338 per Hour
r3.large 2 6.5 15 1 x 32 SSD $0.166 per Hour
r3.xlarge 4 13 30.5 1 x 80 SSD $0.333 per Hour
r3.2xlarge 8 26 61 1 x 160 SSD $0.665 per Hour
r3.4xlarge 16 52 122 1 x 320 SSD $1.33 per Hour
r3.8xlarge 32 104 244 2 x 320 SSD $2.66 per Hour
i2.xlarge 4 14 30.5 1 x 800 SSD $0.853 per Hour
i2.2xlarge 8 27 61 2 x 800 SSD $1.705 per Hour
i2.4xlarge 16 53 122 4 x 800 SSD $3.41 per Hour
i2.8xlarge 32 104 244 8 x 800 SSD $6.82 per Hour
d2.xlarge 4 14 30.5 3 x 2000 HDD $0.69 per Hour
d2.2xlarge 8 28 61 6 x 2000 HDD $1.38 per Hour
d2.4xlarge 16 56 122 12 x 2000 HDD $2.76 per Hour
d2.8xlarge 36 116 244 24 x 2000 HDD $5.52 per Hour

I could contribute a code snippet (after finalising it) which you could include in your build process if this is what should finally happen...

deeplook avatar Nov 03 '16 15:11 deeplook

I suspect we could also do a decent job with just a static (stale) copy of this information.

On Thu, Nov 3, 2016 at 11:18 AM, deeplook [email protected] wrote:

@mrocklin https://github.com/mrocklin Ad 1: Sadly, it seems like AWS APIs were providing pricing info only for spot price history. I've found this online page for on-demand pricing (which probably applies here): https://aws.amazon.com/ec2/pricing/on-demand/ Others can be found here: https://aws.amazon.com/ec2/pricing/

I've tried scraping it and got it eventually mostly working only when using Selenium (not Pandas, Lxml or BeautifulSoup; maybe there's too much JS in it). Now Selenium is likely not a great dependency to have for dask-ec2 (especially in combination with PhantomJS for a headless browser). But then, one would not expect this pricing info to change a lot. So, maybe hardcoding such a current pricing table (ignoring data transfer pricing) with every build of dask-ec2 might be an idea. Since this is different for every AWS region one would need a list of them... I get something like this for Linux instances in us-east-1 right now:

type vCPU ECU Memory (GiB) Instance Storage (GB) Linux/UNIX Usage t2.nano 1 Variable 0.5 EBS Only $0.0065 per Hour t2.micro 1 Variable 1 EBS Only $0.013 per Hour t2.small 1 Variable 2 EBS Only $0.026 per Hour t2.medium 2 Variable 4 EBS Only $0.052 per Hour t2.large 2 Variable 8 EBS Only $0.104 per Hour m4.large 2 6.5 8 EBS Only $0.12 per Hour m4.xlarge 4 13 16 EBS Only $0.239 per Hour m4.2xlarge 8 26 32 EBS Only $0.479 per Hour m4.4xlarge 16 53.5 64 EBS Only $0.958 per Hour m4.10xlarge 40 124.5 160 EBS Only $2.394 per Hour m4.16xlarge 64 188 256 EBS Only $3.83 per Hour m3.medium 1 3 3.75 1 x 4 SSD $0.067 per Hour m3.large 2 6.5 7.5 1 x 32 SSD $0.133 per Hour m3.xlarge 4 13 15 2 x 40 SSD $0.266 per Hour m3.2xlarge 8 26 30 2 x 80 SSD $0.532 per Hour c4.large 2 8 3.75 EBS Only $0.105 per Hour c4.xlarge 4 16 7.5 EBS Only $0.209 per Hour c4.2xlarge 8 31 15 EBS Only $0.419 per Hour c4.4xlarge 16 62 30 EBS Only $0.838 per Hour c4.8xlarge 36 132 60 EBS Only $1.675 per Hour c3.large 2 7 3.75 2 x 16 SSD $0.105 per Hour c3.xlarge 4 14 7.5 2 x 40 SSD $0.21 per Hour c3.2xlarge 8 28 15 2 x 80 SSD $0.42 per Hour c3.4xlarge 16 55 30 2 x 160 SSD $0.84 per Hour c3.8xlarge 32 108 60 2 x 320 SSD $1.68 per Hour p2.xlarge 4 12 61 EBS Only $0.9 per Hour p2.8xlarge 32 94 488 EBS Only $7.2 per Hour p2.16xlarge 64 188 732 EBS Only $14.4 per Hour g2.2xlarge 8 26 15 60 SSD $0.65 per Hour g2.8xlarge 32 104 60 2 x 120 SSD $2.6 per Hour x1.16xlarge 64 174.5 976 1 x 1920 SSD $6.669 per Hour x1.32xlarge 128 349 1952 2 x 1920 SSD $13.338 per Hour r3.large 2 6.5 15 1 x 32 SSD $0.166 per Hour r3.xlarge 4 13 30.5 1 x 80 SSD $0.333 per Hour r3.2xlarge 8 26 61 1 x 160 SSD $0.665 per Hour r3.4xlarge 16 52 122 1 x 320 SSD $1.33 per Hour r3.8xlarge 32 104 244 2 x 320 SSD $2.66 per Hour i2.xlarge 4 14 30.5 1 x 800 SSD $0.853 per Hour i2.2xlarge 8 27 61 2 x 800 SSD $1.705 per Hour i2.4xlarge 16 53 122 4 x 800 SSD $3.41 per Hour i2.8xlarge 32 104 244 8 x 800 SSD $6.82 per Hour d2.xlarge 4 14 30.5 3 x 2000 HDD $0.69 per Hour d2.2xlarge 8 28 61 6 x 2000 HDD $1.38 per Hour d2.4xlarge 16 56 122 12 x 2000 HDD $2.76 per Hour d2.8xlarge 36 116 244 24 x 2000 HDD $5.52 per Hour

I could contribute a code snippet (after finalising it) which you could include in your build process if this is what should finally happen...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ec2/issues/36#issuecomment-258173387, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszAHJKSrijyyH4M5qcJimqx46v83Wks5q6ftbgaJpZM4KoKnF .

mrocklin avatar Nov 03 '16 15:11 mrocklin

I'd suggest looking/scraping/storing from: http://www.ec2instances.info/

quasiben avatar Nov 03 '16 15:11 quasiben

@quasiben Ah, much easier! Especially since it works ok with pandas.read_html();-)

BTW, Ad 2 (Is it possible to launch an EC2 node with a time-to-live?) A friend of mine suggested setting up a cron script right after building it to shut down the instance after a given TTL as an option.

deeplook avatar Nov 03 '16 15:11 deeplook