joydchh
joydchh
> We train this model on 8x A100 80GB GPUs. I'll update the README. > > > I... submit a request for a mini model to do sanity checks on...
> Anyone know how to issue this exception?  > > I have tried use_new_zipfile_serialization=False, but it doesn't work:  did you find some way to fix this?
i found the problem is because of the corrupt tmp weights file. you can check if there is something simliar. Delete the related 3 files, and execute prepare.py again to...
> There should be log messages during training. I feel the rank 0 was down so the other three were waiting for it. Can you post the full log message?...
> @joydchh The rank 0 crashed when it tried to read the dataset. Can you check if all data files are prepared in "/data/OpenChatKit/training/../data/OIG/files/"? > > And I noticed you...