DeepSeek-Coder icon indicating copy to clipboard operation
DeepSeek-Coder copied to clipboard

为什么在进行一次训练加载后,会出现找不到显卡no slot的报错呢?

Open ZhiyuYUE opened this issue 1 year ago • 0 comments

并且第一次训练时,0卡会无法加载完训练集导致卡在94%,终止之后再进行训练就会出现以下报错: raise ValueError(f"No slot '{slot}' specified on host '{hostname}'") ValueError: No slot '4' specified on host 'localhost'

ZhiyuYUE avatar Jun 14 '24 03:06 ZhiyuYUE