Arthur Wu

Results 73 comments of Arthur Wu

txt,用的是wiki , root@I131672d9da00f017a4:/hy-tmp/baichuan-7B/data_dir# lltotal 10004‌ drwxr-xr-x 2 root root 180 Jun 25 17:07 ./drwxr-xr-x 12 root root 4096 Jun 27 18:37 ../ -rw-r--r-- 1 root root 1024000 Jun 25 17:07...

> > [2023-06-26 17:04:13,047] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2023-06-26 17:04:13,057] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2023-06-26 17:04:13,057]...

thanks, but how do i know which one to use? how to check it?

memory_efficient_attention of [xformers](https://github.com/facebookresearch/xformers) it's faster implement than torch? any idea?

still error: size mismatch for transformer.ln_f.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([64]). size mismatch for transformer.ln_f.bias: copying a param with shape...

python inference.py --model_path ./actor_checkpoint_prompts.pt --pretrain facebook/opt-350m --model opt

File "inference.py", line 56, in eval(args) File "inference.py", line 21, in eval actor.model.load_state_dict(state_dict) File "/home/rst/ColossalAI/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

python inference.py --pretrain ./actor_checkpoint_prompts.pt --model bloom

if i use gpt2 to train , i get right results, bloom is not right any way.

I publish a RetNet model for study, you can try it : https://huggingface.co/wac81/toy_retnet_1.3b_pretrain > Hello everyone, Is there any better pre-trained model available now?