DeepSpeedExamples
DeepSpeedExamples copied to clipboard
DeepSpeed-Chat Step-1 training error
Hi, I cannot do step-1 SFT training after the refractoring. pip install deepspeed>=0.9.0
I did this in the folder applications/DeepSpeed-Chat
git clone https://github.com/microsoft/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat/
pip install -r requirements.txt
pip install -e .
Then I enter into the applications/DeepSpeed-Chat/training/step1_supervised_finetuning folder. I run bash training_scripts/opt/single_gpu/run_1.3b.sh. However this gives me the error:
Traceback (most recent call last):
File "/scratch/ybao/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 394, in <module>
main()
File "/scratch/ybao/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 264, in main
train_dataset, eval_dataset = create_prompt_dataset(
File "/scratch/ybao/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/utils/data/data_utils.py", line 350, in create_prompt_dataset
return torch.load(train_fname), torch.load(eval_fname)
File "/users/ybao/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/serialization.py", line 1014, in load
return _load(opened_zipfile,
File "/users/ybao/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/serialization.py", line 1422, in _load
result = unpickler.load()
File "/users/ybao/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/serialization.py", line 1415, in find_class
return super().find_class(mod_name, name)
ModuleNotFoundError: No module named 'utils'
From the error message, I cannot identify what utils module is missing, and from where I should check. There are so many utils in this project. Another thing after the refractoring is that the train.py file on applications/DeepSpeed-Chat/ folder is missing but the readme still use it, which is very misleading.
Hope someone can help me identify the problem. Thanks.
I clone the code today and do not have the problem of missing utils.
Another thing after the refractoring is that the train.py file on applications/DeepSpeed-Chat/ folder is missing but the readme still use it, which is very misleading.
This also confuses me for a while. Maybe you could open another issue for this.