Ujj
Results
2
comments of
Ujj
I am facing the same issue. Unfortunately the JSON logs don't contain the container_name -- the filename isn't useful either since it is the container_id. Any known workarounds for that?
> How many workers in the cluster ? From the error log it's similar to the large scale issue. About 400 GPUs