henan991201

Results 3 comments of henan991201

In my experiment, if I use swiglu, I also find that I need to set TP=4 and PP=2 in the evaluation script in order to get results, otherwise, I will...

GPUS_PER_NODE=8 NNODES=1 TP_SIZE=4 PP_SIZE=2 MICRO_BATCH_SIZE=16 GLOBAL_BATCH_SIZE=256 NLAYERS=24 NHIDDEN=1024 NHEADS=16 SEQ_LEN=2048 SAVE_INTERVAL=500 TRAIN_SAMPLES=220_000_000 LR_DECAY_SAMPLES=200_000_000 LR_WARMUP_SAMPLES=183_105 OPTIMIZER_ARGS=" \ --optimizer adam \ --adam-beta1 0.9 \ --adam-beta2 0.95 \ --adam-eps 1e-8 \ --lr 3.0e-4...

I tried this way but got an error. ValueError: Can't find a valid checkpoint at xxx