dstack
dstack copied to clipboard
dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
This will help with debugging and understanding resources usage, also general diagnostics: - cpu available vs utilized - gpu available vs utilized - memory available vs utilized - total runtime...
Sometimes when dstack job runs and then dies - it shows no sign. I suspect this happens when process is terminated externally, e.g. out of memory. Show the exit code...
runner should show resources (cpu, gpu, memory) in the web UI and potentially CLI (as more info), to allow users to confirm or compare the requested vs granted resources.
For instance, any directory change in `before_run` is persisted in the main run using `file`. Example: ``` - name: dataset provider: python requirements: requirements.txt before_run: - apt-get install -y unzip...
The following behavior of workflow run needs to be illustrated in the doc. For a workflow file (note the lack of tag `:latest` in the dependency: ```yaml workflows: - name:...
So far doc shows example with a single script that can be run directly like `python train.py`. However a real project has folder structure with many `.py` files organized as...
dstack artifacts download doesn't tell you anything if there is a typo in a run name. would be nice to make some warning
Would be nice to be able to set a tag for the run right from the console like dstack run train-model --tag latest
If the current Git repository has no remote branch or it is not set as a tracked branch, the CLI shows the following message: ``` No tracked branch configured for...