DeepSpeed
DeepSpeed copied to clipboard
TypeError: getattr(): attribute name must be string
Describe the bug When I was trying a initialize the model with deepspeed, the following bug appears, and it appears in whatever situation when I want to use deepspeed on my current machine. It basically reports that the name of the optimizer must be a string, but I didn't even specify an optimizer and the default optimizer must be used. This is totally unacceptable and surprising to me to find that such a standard piece of code that comes from a tutorial could fail to run.
To Reproduce Steps to reproduce the behavior:
-
use any model, any training_data, specify the deepspeed_config, and run the following line of code model_engine, optimizer, trainloader, __ = deepspeed.initialize( args=args, model=net, model_parameters=parameters, training_data=trainset)
-
see the following traceback
Traceback (most recent call last):
File "//Transformers/test/111.py", line 86, in
Expected behavior deepspeed engine load successfully
ds_report output
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] [WARNING] please install triton==1.0.0 if you want to use sparse attention sparse_attn ............ [NO] ....... [NO] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] async_io: please install the libaio-dev package with apt [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] utils .................. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/home/hhchen/miniconda3/envs/Torch3.8/lib/python3.8/site-packages/torch'] torch version .................... 1.12.1 torch cuda version ............... 11.3 torch hip version ................ None nvcc version ..................... 11.1 deepspeed install path ........... ['/home/hhchen/miniconda3/envs/Torch3.8/lib/python3.8/site-packages/deepspeed'] deepspeed info ................... 0.6.5, unknown, unknown deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: Linux version 5.11.0-49-generic
- GPU count and types: x8 RTX3090
- Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]
- Python version: 3.8
- Any other relevant info about your setup
Launcher context deepspeed
Docker context Are you using a specific docker image that you can share?
Additional context Add any other context about the problem here.
Hey @henrydylan - have you tried just specifying an optimizer via e.g.
..."optimizer": {
"type": "Adam",
"params": {
"lr": 0.00015
}
},...
Maybe that could help.
Also, you mention that "default optimizer must be used", but I am not sure if that is the case. The documentation does not seem to mention a default optimizer.
Hi @henrydylan - this issue is fairly stale and I suspect there are enough changes that the issue can't be debugged in its original form. If you are still having issues, or if others find this, please open a new issue and link this one.