Ujj

Results 2 comments of Ujj

I am facing the same issue. Unfortunately the JSON logs don't contain the container_name -- the filename isn't useful either since it is the container_id. Any known workarounds for that?

> How many workers in the cluster ? From the error log it's similar to the large scale issue. About 400 GPUs