dstack icon indicating copy to clipboard operation
dstack copied to clipboard

[UX] Allow the configuration YAML file to configure any option supported by `.dstack/profiles.yml`

Open peterschmidt85 opened this issue 1 year ago • 5 comments

Currently, it's not possible to configure spot policy, retry policy, etc., via the configuration YAML file. These settings can only be configured in .dstack/profiles.yml. It would be a lot more convenient if the configuration YAML file allowed the configuration of any option supported by .dstack/profiles.yml.

peterschmidt85 avatar Feb 28 '24 12:02 peterschmidt85

When implementing this feature, we should also take Python API into account. Currently, we always pass Profile everywhere we pass Requirements. For example: https://github.com/dstackai/dstack/blob/38115c8e3d7274dd619f83951a4066b7800173f0/src/dstack/_internal/cli/commands/pool.py#L263

This is a weird interface. Resources are present only in Requirements. Both Profile and Requirements share spot and max_price. Most provisioning parameters are in Profile only.

I think we simplify both the Public API and the internal code significantly if we put all parameters into one model. We could reuse Profile for that.

r4victor avatar Mar 01 '24 04:03 r4victor

The first step is to add the optional profile parameter to RunConfiguration that allows specifying any Profile parameters. They would override parameters read from profiles.yml. The configuration file would look like this:

type: task
profile:
  instance_types: [p3.8xlarge]
  regions: [eu-west-1]
  spot_policy: auto
commands:
  - ...

There will be both RunSpec.profile and RunSpec.configuration.profile. The second will override parameters of the first on the server side – the HTTP API will have the same interface as the CLI/Python API.

As a next step, we should also do a refactoring to fix Profile/Requirements issues described above. The suggestion is:

  1. To introduce all provisioning-related fields from Profile to Requirements such as backends, instance_types, regions. The JobConfigurators will create Requirements from Profile. And then Profile would not need to be passed around (e.g. for get_offers_by_requirements()). Also, we'll be able to pre-filter many offers in get_offers (e.g. backends in dstack Sky). This will improve get_plan execution time.
  2. [Breaking change] Explicitly accept Profile in submit() and get_plan() in the Python API instead of accepting all profile arguments separately. This will simplify API UX and maintenance.
  3. [Breaking change] Remove Requirements from the Public API. Make so that get_offers() and create_instance() accept Profile and ResourceSpec. Consider introducing InstanceSpec to encapsulate Profile, ResourceSpec, etc. This will fix the weird interface with duplicating Profile and Requirements.

r4victor avatar Mar 21 '24 10:03 r4victor

We should remove default values from Profile model introduced for pools (creation_policy, termination_policy, termination_idle_time). The default values should only be set by the server. Otherwise, run configuration profiles parameters will always be non-null and override profiles.yml parameters that will never take effect.

r4victor avatar Mar 21 '24 11:03 r4victor

@r4victor I somewhat don't like using the word profile in YAML, I would personally prefer to define all its properties on the top level:

type: task

instance_types: [p3.8xlarge]

regions: [eu-west-1]

spot_policy: auto

commands:
  - ...

peterschmidt85 avatar Mar 21 '24 22:03 peterschmidt85

Agree. I like the top-level approach because users don't need to think about profiles when writing run configurations. What I didn't like about it is that our run configuration reference would become hard to read and think about due to lots of top-level parameters. Probably, we can mitigate this by putting profile parameters at the end, thus grouping them.

As regards to implementation details, they mostly remain the same. We need to group common RunConfiguration and Profile parameters in some model (e.g. in ProfileParameters) and inherit it in RunConfigurations. JobConfigurator will merge RunConfiguration and Profile parameters and store them in JobSpec/Requirements.

r4victor avatar Mar 22 '24 06:03 r4victor