Arthur Wu comments

Results 73 comments of


                                            Arthur Wu

[Question] a100 80g单卡训练还 out of memory

txt，用的是wiki ， root@I131672d9da00f017a4:/hy-tmp/baichuan-7B/data_dir# lltotal 10004‌ drwxr-xr-x 2 root root 180 Jun 25 17:07 ./drwxr-xr-x 12 root root 4096 Jun 27 18:37 ../ -rw-r--r-- 1 root root 1024000 Jun 25 17:07...

[Question] a100 80g单卡训练还 out of memory

> > [2023-06-26 17:04:13,047] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2023-06-26 17:04:13,057] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2023-06-26 17:04:13,057]...

memory-efficient attention is default opened? if i dont use flash attn

thanks, but how do i know which one to use? how to check it?

memory-efficient attention is default opened? if i dont use flash attn

memory_efficient_attention of [xformers](https://github.com/facebookresearch/xformers) it's faster implement than torch? any idea?

[BUG]: 运行chatgpt推理示例报错

still error： size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for transformer.ln_f.bias: copying a param with shape...

[BUG]: inference ERROR again

python inference.py --model_path ./actor_checkpoint_prompts.pt --pretrain facebook/opt-350m --model opt

[BUG]:chatgpt inference still ERROR after fix

File "inference.py", line 56, in eval(args) File "inference.py", line 21, in eval actor.model.load_state_dict(state_dict) File "/home/rst/ColossalAI/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(