Luca Manolache
Luca Manolache
Allows specifying GPU memory in the `job-config.yaml`. The new option is for `accelerators` accepting the following syntax: ``` accelerators: (manufacturer):(memory)(mb|gb|tb)(+)?:(count) accelerators: (memory)(mb|gb|tb)(+)?:(count) accelerators: (memory)(mb|gb|tb)(+)? ``` An example would be `NVIDIA:40GB+:8`...
Tested (run the relevant ones): - [ ] Code formatting: install pre-commit (auto-check on commit) or `bash format.sh` - [ ] Any manual or new tests for this PR (please...
Use the same filters as `storage_utils.py` for uploading to a bucket for `rsync`. Fixes #5006. Tested (run the relevant ones): - [x] Code formatting: install pre-commit (auto-check on commit) or...
Running ```bash for i in {1..1000}; do sky api status & done ``` will consume ~30 GB of memory, this can cause computers with 32GB of memory to slow down...
If the API server has an unclean exit and is holding a file lock, those might still be held when the API server restarts causing deadlocks. We should ensure all...
Ability to cancel jobs controller without typing `delete`. Should be able to do `sky down jobs-controller -y --force` or `sky down jobs-controller -y` to cancel it without needing to type...
Allows choosing the port of the API server with `sky api start --port `. Will create a file in `~/.sky/api_server/port` with the port, this will be used for subsequent calls...
Had a previous k8s setup, removed it and added a new one. Running `sky queue` shows the following errors: would be nice to have cleaner errors (don't show `sky.exceptions.ClusterOwnerIdentityMismatchError`) Additionally,...
If someone is on a nightly version and switches to an older version/older stable version, the SQL databases might have extra columns causing issues when accessing them. Reproduce: 1. Open...
Adds an per request `lru_cache` to `get_cached_enabled_clouds_or_refresh` to make all subsequent calls during `jobs launch` faster. Time for jobs launch goes from ~37s to ~30s. Closes #5913. Tested (run the...