ms-swift icon indicating copy to clipboard operation
ms-swift copied to clipboard

使用zero3进行多机多卡全量微调,保存的模型权重不完整

Open ultrazhl98 opened this issue 1 year ago • 1 comments

按照model.safetensors.index.json,模型权重分为了model-00001-of-00004.safetensors,model-00002-of-00004.safetensors, model-00003-of-00004.safetensors, model-00004-of-00004.safetensors进行保存,但是实际上只有三个模型权重文件,没有model-00001-of-00004.safetensors

ultrazhl98 avatar May 21 '24 03:05 ultrazhl98

按照model.safetensors.index.json,模型权重分为了model-00001-of-00004.safetensors,model-00002-of-00004.safetensors, model-00003-of-00004.safetensors, model-00004-of-00004.safetensors进行保存,但是实际上只有三个模型权重文件,没有model-00001-of-00004.safetensors

您好,zero3可以跑多机多卡吗

1SingleFeng avatar Jul 18 '24 09:07 1SingleFeng

支持的

按照model.safetensors.index.json,模型权重分为了model-00001-of-00004.safetensors,model-00002-of-00004.safetensors, model-00003-of-00004.safetensors, model-00004-of-00004.safetensors进行保存,但是实际上只有三个模型权重文件,没有model-00001-of-00004.safetensors

您好,zero3可以跑多机多卡吗

tastelikefeet avatar Aug 28 '24 05:08 tastelikefeet

按照model.safetensors.index.json,模型权重分为了model-00001-of-00004.safetensors,model-00002-of-00004.safetensors, model-00003-of-00004.safetensors, model-00004-of-00004.safetensors进行保存,但是实际上只有三个模型权重文件,没有model-00001-of-00004.safetensors

可以看下硬盘是否满了,如果仍然复现请重新打开本issue

tastelikefeet avatar Aug 28 '24 05:08 tastelikefeet