DeepSpeed
DeepSpeed copied to clipboard
[BUG] return getattr(args, f"{model_type[step_num]}_model")
Describe the bug when run train command from examples, error shows up.
[root@iZuf69ogcccxrcfuonlyzfZ DeepSpeed-Chat]# python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu
File "train.py", line 110
return getattr(args, f"{model_type[step_num]}_model")
^
SyntaxError: invalid syntax
ds_report output
Please run ds_report
to give us details about your setup.
[root@iZuf69ogcccxrcfuonlyzfZ DeepSpeed-Chat]# python3 -m deepspeed.env_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
/usr/local/python3/lib/python3.10/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/python3/lib/python3.10/site-packages/torch']
torch version .................... 2.0.0+cu117
deepspeed install path ........... ['/usr/local/python3/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.9.0, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
System info (please complete the following information):
- OS: [CentOS 7.9]
- GPU count and types [single Nvidia A100 80G]
- Python version: 3.10
Additional context I've got many errors when follow your official instructions to install DeepSpeed Chat. This error is the biggest one that I couldn't solve it myself. Please kindly help~
@koalawangyang could you please add a print out before this line and output the contents of args
: print(args)
? I just tried the same command and I get no error. Also, please share the other errors you are seeing. We would like to continue to improve the scripts we have :)
I also encountered the same problem
I just ran the example: python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --deployment-type single_node
but I got the same issue: python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --deployment-type single_node File "train.py", line 110 return getattr(args, f"{model_type[step_num]}_model") ^ SyntaxError: invalid syntax
I've solved this problem.
it's caused by the 'python' command will use the system default python 2.x to run the train.py script.
change the command to 'python3 train.py xxxxx' will work then.
close this issue.