jvstme

Results 55 comments of jvstme

Sometimes the dstack-gateway application won't start at all after instance reboot. Recently a planned reboot lead to empty `~/dstack/state.json` file, so dstack-gateway failed to restart and gateway state was lost....

Even more relevant now with the introduction of the Community Cloud. It seems that many spot GPUs are not actually available at the price returned by the RunPod API and...

RunPod has recognized this issue, we are waiting for a fix from their end

@ASmedberg-woolpert, this ticket is exclusive to the RunPod backend, we expect it to be fixed on the RunPod side. Regarding the issue you're experiencing with Azure, I've suggested some solutions...

Relevant server logs: ``` {"message": "job(302f9f)aana-tests-0-0: now is RUNNING", "logger": "dstack._internal.server.background.tasks.process_running_jobs", "timestamp": "2024-09-29 19:23:00,931", "level": "INFO"} {"message": "job(302f9f)aana-tests-0-0: failed because runner is not available or return an error, age=0:13:46.156854", "logger":...

Final RunPod container logs just before the server to runner connection fails: ``` 2024-09-30T20:53:54.569234473Z time=2024-09-30T20:53:54.568856Z level=debug status=200 method=GET endpoint=/api/pull 2024-09-30T20:54:00.506831266Z time=2024-09-30T20:54:00.506447Z level=debug method=GET endpoint=/api/pull status=200 2024-09-30T20:54:06.187753351Z time=2024-09-30T20:54:06.187541Z level=debug method=GET endpoint=/api/pull...

So far I only managed to reproduce this with the full configuration from the aana_sdk repo, which takes about 30-40 minutes. The shorter configuration from step 4 works fine for...

@peterschmidt85, optional volume mounts are not yet supported for network volumes. While it's not a priority, I think it makes sense to support them (see the use case in the...

The draft implementation is saved in `issue_2621_nebius_volumes`, but blocked because Nebius requires stopping the VM before shared filesystems can be attached or detached. This is slow, incompatible with blocks, and...