dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[Feature]: Allow simple selection of which instance to use from `dstack run`

Open spott opened this issue 11 months ago • 8 comments

Problem

A user might want to choose between a small subset of the returned instances that they can run a job on due to things like stability, cost, or compute speed among instances that are otherwise equivalent.

Solution

Allow the user to select from some small number of instances that are shown (top 5 maybe?) by calling it by number when they accept the run of a job.

Allow a mechanism for sorting (so --prefer-cheaper, --prefer-gpu-mem, --prefer-gpu-speed, etc maybe with a global config option as well), and then when they are presented with the list of instances, they can choose a number instead of just "y/n". If that instance is no longer available after the user selects, then either reshow a list or just choose the best one left on the list (this can be configured).

Benefit

Sometimes you get multiple instances that are equivalent in some chosen way (price/gpu memory/ cpu memory etc.) however are more desirable along a second axis: for example you want a >=40Gb gpu and if an H100 is available take that, but if not, an A100 or A6000 is fine. Or you want the cheapest node you can get with >20Gb of vram.

Alternatives

At the moment, we can choose along some simple dimensions (max price, characteristic of gpu, etc) which allows for some narrowing of the pool size, but more choice would be better.

Would you like to help contributing this feature?

Yes

spott avatar Feb 28 '24 18:02 spott

To add more context:

Below is the current behaviour of dstack:

  • Currently, the dstack run command reads the all requirements from the YAML configuration file as well as from the CLI arguments and the profile if any (.dstack/profiles.yml). For example, the resource requirements may be either a range (e.g. gpu: 24GB..) or specific values (e.g. gpu:A100:40GB). Also, the user may specify a region or spot policy or not.
  • Based on the all requirements, dstack run collects the sorted offers from the configured backends, merges them by price, and presents the user (Note: Vast.ai's offers are sorted by their "performance score"; other backends sort them by price). The number of offers can be anything from zero to thousands (depending on how many different offers each backend gives).
  • The user cannot select any single or multiple offers at the moment, nor does dstack promise to use the same offers when it finally runs. If the user confirms the offers, dstack simply generates the list of offers again (the second time, as they could have changed, and iterate over than trying to provision the run job).
  • Note, the offers may include different kinds of variants: a) idle or busy instances from the pool; b) available offers from backends; c) offers from the backend that are either available or not - but we'll only know if they are available when we try to provision them.

peterschmidt85 avatar Mar 01 '24 11:03 peterschmidt85

Solution 1. Allow the user to select a single (or multiple) offer(s) from the list manually

Downsides:

  1. The user will be able to pick only from the list of offers that are shown (by default only top 3, can be changed via --max-offers). The list can include up to a thousand offers.
  2. Given that offers include both busy instances from the pool and lots of offers that can be unavailable (as there is no way to tell in advance that they are available until you try provision them - aws, gcp, azure, etc), the chances that dstack will successfully provision the selected offers are very low (close to 0). In this case, it's likely that the user instead of quickly submitting the run, will spend time, trying different options manually.

@spott Any chance we can come up with a few very specific use cases when this feature can be helpful and won't suffer from the downsides above?

peterschmidt85 avatar Mar 01 '24 12:03 peterschmidt85

the chances that dstack will successfully provision the selected offers are very low (close to 0)

I think this is the crux of the issue.

This is true for aws, gcp, azure, etc... but I'm under the impression it isn't true for vast, tensordock, runpod (eagerly awaiting this one...), etc. I can go to vast.ai and (mostly) click on any instance they show on their website and get it. I expect some instances go quickly, but most are around long enough to specifically request them.

The difference is that the big cloud providers (aws, gcp, azure, to a lesser extent lambda) are offering instance types from a pool of instances and you don't know if that pool is full or empty.

The "gig economy" clouds (vast, tensordock, runpod, etc) are offering specific instances, not instance pools: so each instance is slightly different.

In my mind, this feature is valuable for the "gig economy" clouds, but not for the "big cloud". I have seen offers from vast for approx. the same price where both have 4090s, but one has 90gb of ram and the other has 32gb:

Screenshot 2024-03-01 at 09 25 58

Re. downside 1, I think this feature only makes sense after a feature has been built that allows for sorting of the offers along different axes. This allows the top offers to be relevant to the user, and thus it isn't a problem that you only show a few.

The tricky thing here is to enable this feature, but not make it the primary way that users interact with dstack: they should define their instances enough (through --max-price, --gpu --dist, .dstack/profile etc.) that any of the offers that pop up will work for the job that they are trying to do. I think the reason people who want this for aws/gcp/etc. is because there aren't enough ways to constrain the offers at the moment.

So, in conclusion, I think there are three things that will help with the underlying reason that people want this feature, and I would attack them in this order:

  • more ways to constrain offers along more dimensions (memory/cpu cores/cpu "speed"?/bandwidth/etc.) will help allow people to use dstack without caring about choosing between offers that pop up for the big clouds.
  • ways to sort the offer list so that users know when they try and get an instance that they are going to get the best instance for their use case, even if they can't explicitly choose which instance to get.
  • finally, this feature: a way to choose from a sorted list that has been constrained to instances that are useful so that I can grab a good deal when I see one. Maybe restricted to the "gig economy clouds", and with a user-changeable behavior that allows the user to say "try this instance first, if that fails, then just grab whatever", or "try this instance, and if it fails, show me another list of offers"

spott avatar Mar 01 '24 16:03 spott

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Apr 01 '24 01:04 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale.

peterschmidt85 avatar Apr 25 '24 01:04 peterschmidt85

@r4victor Do you think BTW, we could allow this via --instance?

peterschmidt85 avatar May 15 '24 08:05 peterschmidt85

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Jun 15 '24 01:06 peterschmidt85

bump to keep the bots at bay.

spott avatar Jun 15 '24 04:06 spott

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Jul 17 '24 01:07 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

peterschmidt85 avatar Aug 01 '24 01:08 peterschmidt85