zscwind

Results 7 issues of zscwind

hello,课件有先后观看顺序吗?

vocab_size not found in data/openwebtext/meta.pkl, using GPT-2 default of 50257 Initializing a new model from scratch number of parameters: 124.34M compiling the model... (takes a ~minute) To use data.metrics please...

I used 4 GPUs on 1 node: `torchrun --standalone --proc_per_node=4 train.py --compile=False` But, the training speed is just like 1 GPU,why?

Can we train gpt2-xl on nanoGPT? If possible,where's its datasets?

### 🐛 Describe the bug `from colossalai.kernel.op_builder.layernorm import LayerNormBuilder ModuleNotFoundError: No module named 'colossalai.kernel.op_builder'` ### Environment pyhton 3.8.0 CUDA 11.6 torch 1.13.0+cu116

bug

`RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel: size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([2304, 768]) from checkpoint, the shape in current model is torch.Size([768, 2304]). size mismatch...