Victor Skvortsov
Victor Skvortsov
@peterschmidt85, is this relevant?
The main reason we didn't support wildcard certificates initially is that they do not support HTTP-01 challenge that we can easily automate but only DNS-01 challenge. Automating DNS-01 challenge is...
#1827 added a metrics API. Exporting metrics to external system can be done manually and we should add examples on how to do that.
Closed by #2432 #2359
AzureCompute.create_instance() hangs while waiting for the vm to be created here: https://github.com/dstackai/dstack/blob/4a7a69127ff17727a15f7c6eff99b5940f9245e2/src/dstack/_internal/core/backends/azure/compute.py#L455 On Azure side, the vm stuck in the Creating state – that's why create_instance() never returns. Should be...
Also, consider updating job processing tasks so that the server can process more than one job/run in parallel to prevent one stuck job from blocking the processing of other jobs.
@peterschmidt85, the last container should be kept for debugging purposes. There is no good reason to keep all the containers – the previous container should be deleted when a new...
In the context of RAM/VRAM, GB (base 10) doesn't make sense because memory is always in base 2. Most vendors use GB for GiB. This is a convention which predates...
The current behavior is to notify the admin with logger.error if there is an unexpected error when terminating the instances and mark the instance as terminated. An unexpected error may...
@jvstme Shouldn't it be closed by #2190?