PaLM-colossalai
PaLM-colossalai copied to clipboard
Gemin+2.5D badcase
Using MR #41
The launching script is as follows.
env OMP_NUM_THREADS=12 torchrun --standalone --nproc_per_node=4 train.py --from_torch --config=configs/palm_8b_zero_2p5d_badcase.py
It failed after a few iterations. I prefer to attribute the bug to Gemini. Error log likes

