Xiao
Xiao
> I try this command `torchrun --nproc_per_node=4 train.py --synthetic 2>&1 | tee run.log` to run the [train.py](https://github.com/hpcaitech/ColossalAI/blob/main/examples/tutorial/sequence_parallel/train.py) it doesn't work out
> ### 🐛 Describe the bug > I try to run a config by using the [train_gpt.py](https://github.com/hpcaitech/ColossalAI-Examples/blob/main/language/gpt/train_gpt.py). I add a model on the [gpt.py](https://github.com/hpcaitech/Titans/blob/main/titans/model/gpt/gpt.py) . > > ``` > >...
my config is below. ``` from colossalai.amp import AMP_TYPE from titans.loss.lm_loss import GPTLMLoss from titans.model.gpt import gpt2_1_3B, gpt2_test4gpu350M from torch.optim import Adam BATCH_SIZE = 4 SEQ_LEN = 2048 #here the...
> if I want to add parallel parallelism and sequence parallelism(SP) on the gpt, how should I run the code? I am confused by the different code and different document....
> Any help would be appreciated @tjruwase @stas00 It seems this is OOM. You memory used 513497
> > > Any help would be appreciated @tjruwase @stas00 > > > > > > It seems this is OOM. You memory used 513497 > > Yes, I know....
I do not use deepspeed to run 15B model. I use the alpa to run 15 model on 32GPUs
> My deepspeed version is 0.8.1 , my torch version is 1.13.1 and my transformer version is transformers==4.21.2. My CPU memory is 500GB > > I follow the [document](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/text-generation) to...
> thanks, btw ,do you use zero to run deepspeed inference?
> @lambda7xx, please see example [bloom-ds-zero-inference.py](https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-zero-inference.py). I use this code to inference a bloom model, which is 176B model on 8 V100-32GB, The e2e time is 2000s, I think it's...