DeepSpeed
DeepSpeed copied to clipboard
AttributeError: module 'cpu_adam' has no attribute 'create_adam'
deepspeed to one node multi gpu with run_clm.py is error! deepspeed == 0.8.1 the error is
AttributeError: module 'cpu_adam' has no attribute 'create_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f1295047040>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fc6719f5040>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f5d9b03d040>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'
I got the same error, is it because the deepspeed version?
I got the same error, have you solve this problem?
Hi @lw3259111 @jiacheng-ye @james-yw. Thank you for reporting this issue. I tried the master branch of DeepSpeed and transformer yet I cannot reproduce this issue. Could you provide the deepspeed config json file and the commandline you use?
Hi @lw3259111 @jiacheng-ye @james-yw. Thank you for reporting this issue. I tried the master branch of DeepSpeed and transformer yet I cannot reproduce this issue. Could you provide the deepspeed config json file and the commandline you use? I got the same error `{ "bfloat16": { "enabled": false }, "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1, "fp16_full_eval": "auto", "fp16_backend":"auto"
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "auto",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": false
},
"offload_param": {
"device": "cpu",
"pin_memory": false
},
"allgather_partitions": true,
"allgather_bucket_size": 1e3,
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 1e3,
"contiguous_gradients": true,
"stage3_max_live_parameters": 1e5,
"stage3_max_reuse_distance": 1e5,
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": 1e3,
"sub_group_size": 1e3
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"steps_per_print": 1e3
}` commandline : deepspeed train.py
self.ds_opt_adam = CPUAdamBuilder().load()
self.ds_opt_adam.create_adam(self.opt_id,
lr,
betas[0],
betas[1],
eps,
weight_decay,
adamw_mode,
should_log_le("info"))
why 'cpu_adam' has no attritube 'create_adam'
@lixinliu1995 @yaozhewei @lw3259111 @jiacheng-ye @HeyangQin I have solved this problem by adjusting the version of deepspeed to 0.7.7 . Before that, the error will happen when the version of deepspeed == 0.8.1 , so I doubt that the lastest version causes the error.
Hi @lixinliu1995 @james-yw. I think this problem is likely caused by a thrid-party op_builder as fixed in https://github.com/microsoft/DeepSpeed/pull/2963. Could you try running python -c "import op_builder; print(op_builder.__file__)"
?
Hi @lw3259111 @jiacheng-ye @james-yw @lixinliu1995. This error is caused by the name collision from the colossalai installation. In short: op_builder
is a key component in deepspeed that enables JIT compilation. Yet colossalai installs another top level package called op_builder
that breaks the functionality of deepspeed. We have fixed this issue in https://github.com/microsoft/DeepSpeed/pull/2963 and just published deepspeed v0.8.3 to include this fix in our pypi release.
You can update your deepspeed installation by pip install deepspeed --upgrade
Please let us know if you still have this issue with the latest version of deepspeed (v0.8.3 or newer)
@lw3259111, @jiacheng-ye, @james-yw, @lixinliu1995: thank you all for using DeepSpeed, please re-open if v0.8.3 does not fix your issue please re-open the issue
@lw3259111, @jiacheng-ye, @james-yw, @lixinliu1995: thank you all for using DeepSpeed, please re-open if v0.8.3 does not fix your issue please re-open the issue
I still have this problem with deepspeed 0.7.7 and 0.8.3.
AttributeError: module 'cpu_adam' has no attribute 'create_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f7c7ad5bd30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fd71f55ad30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fdce3ee7d30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f1d6c7a6d30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fccb3f00d30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'
Hi @Coding-Zuo. Could you try running python -c "import op_builder; print(op_builder.__file__)"
?
Running into the same error with the both the version of deepspeed 0.8.3 and 0.8.3+30d97705 (current main branch).
raise Exception(f">- DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.0 does not match the version torch was compiled with 11.7,
unable to compile cuda/cpp extensions without a matching cuda version.
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fbb8c3c3f70>
Traceback (most recent call last):
File "/home/shahswai/miniconda3/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7d9e1f8f70>
Traceback (most recent call last):
File "/home/shahswai/miniconda3/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Also its saying no module named op_builder.
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'op_builder'
So i ran it inside the ops directory
(base) ip-172-31-4-144:~/miniconda3/lib/python3.9/site-packages/deepspeed/ops > python -c "import op_builder; print(op_builder.__file__)"
/home/user/miniconda3/lib/python3.9/site-packages/deepspeed/ops/op_builder/__init__.py```
Hi @Coding-Zuo. Could you try running
python -c "import op_builder; print(op_builder.__file__)"
?
ModuleNotFoundError: No module named 'op_builder'
CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception:
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fb51ff05990> Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'