DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

AttributeError: module 'cpu_adam' has no attribute 'create_adam'

Open lw3259111 opened this issue 1 year ago • 3 comments

deepspeed to one node multi gpu with run_clm.py is error! deepspeed == 0.8.1 the error is

AttributeError: module 'cpu_adam' has no attribute 'create_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f1295047040>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fc6719f5040>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f5d9b03d040>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/cpu_adam.py", line 108, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'

lw3259111 avatar Feb 25 '23 12:02 lw3259111

I got the same error, is it because the deepspeed version?

jiacheng-ye avatar Feb 27 '23 09:02 jiacheng-ye

I got the same error, have you solve this problem?

james-yw avatar Mar 13 '23 09:03 james-yw

Hi @lw3259111 @jiacheng-ye @james-yw. Thank you for reporting this issue. I tried the master branch of DeepSpeed and transformer yet I cannot reproduce this issue. Could you provide the deepspeed config json file and the commandline you use?

HeyangQin avatar Mar 14 '23 21:03 HeyangQin

Hi @lw3259111 @jiacheng-ye @james-yw. Thank you for reporting this issue. I tried the master branch of DeepSpeed and transformer yet I cannot reproduce this issue. Could you provide the deepspeed config json file and the commandline you use? I got the same error `{ "bfloat16": { "enabled": false }, "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1, "fp16_full_eval": "auto", "fp16_backend":"auto"

},
"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},
"scheduler": {
    "type": "auto",
    "params": {
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},
"zero_optimization": {
    "stage": 3,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": false
    },
    "offload_param": {
        "device": "cpu",
        "pin_memory": false
    },

    "allgather_partitions": true,
    "allgather_bucket_size": 1e3,
    "overlap_comm": false,
    "reduce_scatter": true,
    "reduce_bucket_size": 1e3,
    "contiguous_gradients": true,
    "stage3_max_live_parameters": 1e5,
    "stage3_max_reuse_distance": 1e5,
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": 1e3,
    "sub_group_size": 1e3
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"steps_per_print": 1e3

}` commandline : deepspeed train.py

lixinliu1995 avatar Mar 19 '23 16:03 lixinliu1995

2023-03-20 00-07-50屏幕截图

lixinliu1995 avatar Mar 19 '23 16:03 lixinliu1995

self.ds_opt_adam = CPUAdamBuilder().load()
self.ds_opt_adam.create_adam(self.opt_id,
                               lr,
                               betas[0],
                               betas[1],
                               eps,
                               weight_decay,
                               adamw_mode,
                               should_log_le("info"))

why 'cpu_adam' has no attritube 'create_adam'

lixinliu1995 avatar Mar 19 '23 16:03 lixinliu1995

@lixinliu1995 @yaozhewei @lw3259111 @jiacheng-ye @HeyangQin I have solved this problem by adjusting the version of deepspeed to 0.7.7 . Before that, the error will happen when the version of deepspeed == 0.8.1 , so I doubt that the lastest version causes the error.

james-yw avatar Mar 20 '23 01:03 james-yw

Hi @lixinliu1995 @james-yw. I think this problem is likely caused by a thrid-party op_builder as fixed in https://github.com/microsoft/DeepSpeed/pull/2963. Could you try running python -c "import op_builder; print(op_builder.__file__)"?

HeyangQin avatar Mar 20 '23 16:03 HeyangQin

Hi @lw3259111 @jiacheng-ye @james-yw @lixinliu1995. This error is caused by the name collision from the colossalai installation. In short: op_builder is a key component in deepspeed that enables JIT compilation. Yet colossalai installs another top level package called op_builder that breaks the functionality of deepspeed. We have fixed this issue in https://github.com/microsoft/DeepSpeed/pull/2963 and just published deepspeed v0.8.3 to include this fix in our pypi release. You can update your deepspeed installation by pip install deepspeed --upgrade Please let us know if you still have this issue with the latest version of deepspeed (v0.8.3 or newer)

HeyangQin avatar Mar 20 '23 17:03 HeyangQin

@lw3259111, @jiacheng-ye, @james-yw, @lixinliu1995: thank you all for using DeepSpeed, please re-open if v0.8.3 does not fix your issue please re-open the issue

jeffra avatar Mar 20 '23 18:03 jeffra

@lw3259111, @jiacheng-ye, @james-yw, @lixinliu1995: thank you all for using DeepSpeed, please re-open if v0.8.3 does not fix your issue please re-open the issue

I still have this problem with deepspeed 0.7.7 and 0.8.3.

AttributeError: module 'cpu_adam' has no attribute 'create_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f7c7ad5bd30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fd71f55ad30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fdce3ee7d30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f1d6c7a6d30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fccb3f00d30> Traceback (most recent call last): File "/home/yuhui.zuo/miniconda3/envs/newgc/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: module 'cpu_adam' has no attribute 'destroy_adam'

Coding-Zuo avatar Apr 04 '23 13:04 Coding-Zuo

Hi @Coding-Zuo. Could you try running python -c "import op_builder; print(op_builder.__file__)"?

HeyangQin avatar Apr 05 '23 23:04 HeyangQin

Running into the same error with the both the version of deepspeed 0.8.3 and 0.8.3+30d97705 (current main branch).

raise Exception(f">- DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
Exception: >- DeepSpeed Op Builder: Installed CUDA version 12.0 does not match the version torch was compiled with 11.7,
unable to compile cuda/cpp extensions without a matching cuda version.
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7fbb8c3c3f70>
Traceback (most recent call last):
  File "/home/shahswai/miniconda3/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7d9e1f8f70>
Traceback (most recent call last):
  File "/home/shahswai/miniconda3/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
    self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

Also its saying no module named op_builder.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'op_builder'

So i ran it inside the ops directory

(base) ip-172-31-4-144:~/miniconda3/lib/python3.9/site-packages/deepspeed/ops > python -c "import op_builder; print(op_builder.__file__)"
/home/user/miniconda3/lib/python3.9/site-packages/deepspeed/ops/op_builder/__init__.py```

swairshah avatar Apr 06 '23 17:04 swairshah

Hi @Coding-Zuo. Could you try running python -c "import op_builder; print(op_builder.__file__)"?

ModuleNotFoundError: No module named 'op_builder'

Coding-Zuo avatar Apr 07 '23 08:04 Coding-Zuo

CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception:

Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fb51ff05990> Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del self.ds_opt_adam.destroy_adam(self.opt_id) AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

Coding-Zuo avatar Apr 07 '23 08:04 Coding-Zuo