DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

MoQ problem :'str' object has no attribute 'size'

Open ImNoBadBoy opened this issue 2 years ago • 1 comments

I got an error using MoQ on ROCM environment on AMD GPU error: ‘’‘ Traceback (most recent call last): File "train_deepspeed.py", line 694, in main(args) File "train_deepspeed.py", line 554, in main warmup=False, scaler=scaler, fp16=fp16) File "/public/home/gh/cnn/train_utils/train_deepspeed_utils.py", line 69, in train_one_epoch_deepspeed model_engine.step() File "/public/home/gh/anaconda3/envs/task2/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 1864, in step self._take_model_step(lr_kwargs) File "/public/home/gh/anaconda3/envs/task2/lib/python3.6/site-packages/deepspeed/runtime/engine.py", line 1786, in _take_model_step block_eigenvalue, File "/public/home/gh/anaconda3/envs/task2/lib/python3.6/site-packages/deepspeed/runtime/quantize.py", line 73, in quantize if len(p.size()) > 1: AttributeError: 'str' object has no attribute 'size' ‘’’

My JSON file is as follows ‘’‘ { "train_batch_size": 128, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 2000, "zero_optimization": { "stage": 0 }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": 0, "warmup_max_lr": 0.0032, "warmup_num_steps": 91 } }, "optimizer": { "type": "Lamb", "params": { "lr": 0.0032, "betas": [ 0.8, 0.999 ], "eps": 1e-8, "weight_decay": 3e-7 } }, "zero_allow_untested_optimizer": true, "fp16": { "enabled": false }, "quantize_training": { "enabled": true, "quantize_verbose": true, "quantizer_kernel": true, "quantize-algo": { "q_type": "symmetric" }, "quantize_bits": { "start_bits": 16, "target_bits": 8 }, "quantize_schedule": { "quantize_period": 22, "schedule_offset": 182 }, "quantize_groups": 8 } } ‘’’

ImNoBadBoy avatar Jun 25 '22 11:06 ImNoBadBoy

Hi,

We recently refactored the MoQ part (version >=0.7.0 for Deepspeed). Please try the newest version and let us know if that works. Here is the new tutorial link: https://www.deepspeed.ai/tutorials/model-compression/

yaozhewei avatar Jul 29 '22 22:07 yaozhewei

AMD support updated as well, please try with recent DeepSpeed and re-open if the issue persists.

loadams avatar Jun 13 '23 17:06 loadams