ColossalAI
ColossalAI copied to clipboard
[BUG]: confusing ColossalAI INFO of examples.language.gpt.gemini.demo
🐛 Describe the bug
I want to use DP=2, TP=2 on 2GPUS. and the info is as bellow:
INFO: Distributed environment is initialized, data parallel size: 2, pipeline parallel size: 1, tensor parallel size: 1
print("** tp_pg", tp_pg) # added in [the train_gpt_demo.py#234](https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/train_gpt_demo.py#234)
** tp_pg ProcessGroup(ranks=[0, 1],
rank=1, dp=1, tp=2)
+ export DISTPLAN=CAI_Gemini
+ DISTPLAN=CAI_Gemini
+ export GPUNUM=2
+ GPUNUM=2
+ export TPDEGREE=2
+ TPDEGREE=2
+ export PLACEMENT=cpu
+ PLACEMENT=cpu
+ export USE_SHARD_INIT=False
+ USE_SHARD_INIT=False
+ export BATCH_SIZE=8
+ BATCH_SIZE=8
+ export MODEL_TYPE=gpt2_medium
+ MODEL_TYPE=gpt2_medium
+ export TRAIN_STEP=10
+ TRAIN_STEP=10
+ '[' False = True ']'
+ USE_SHARD_INIT=
+ mkdir -p gemini_logs
+ torchrun --standalone --nproc_per_node=2 ./train_gpt_my.py --tp_degree=2 --model_type=gpt2_medium --batch_size=8 --placement=cpu --distplan=CAI_Gemini --train_step=10
+ tee ./gemini_logs/gpt2_medium_CAI_Gemini_gpu_2_bs_8_tp_2_cpu.log
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
/root/py37/lib/python3.7/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
registered at aten/src/ATen/RegisterSchema.cpp:6
dispatch key: Meta
previous kernel: registered at ../aten/src/ATen/functorch/BatchRulesScatterOps.cpp:1053
new kernel: registered at /dev/null:219 (Triggered internally at ../aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.)
self.m.impl(name, dispatch_key, fn)
environmental variable OMP_NUM_THREADS is set to 160.
/root/py37/lib/python3.7/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
registered at aten/src/ATen/RegisterSchema.cpp:6
dispatch key: Meta
previous kernel: registered at ../aten/src/ATen/functorch/BatchRulesScatterOps.cpp:1053
new kernel: registered at /dev/null:219 (Triggered internally at ../aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.)
self.m.impl(name, dispatch_key, fn)
environmental variable OMP_NUM_THREADS is set to 160.
[02/22/23 15:14:59] INFO colossalai - colossalai - INFO: /root/py37/lib/python3.7/site-packages/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[02/22/23 15:14:59] INFO colossalai - colossalai - INFO: /root/py37/lib/python3.7/site-packages/colossalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 1 is bound to device 1
[02/22/23 15:15:03] INFO colossalai - colossalai - INFO: /root/py37/lib/python3.7/site-packages/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 1, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel
seed is ParallelMode.DATA.
[02/22/23 15:15:03] INFO colossalai - colossalai - INFO: /root/py37/lib/python3.7/site-packages/colossalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024, ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel
seed is ParallelMode.DATA.
INFO colossalai - colossalai - INFO: /root/py37/lib/python3.7/site-packages/colossalai/initialize.py:120 launch
INFO colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 2, pipeline parallel size: 1, tensor parallel size: 1
INFO colossalai - colossalai - INFO: ./train_gpt_my.py:222 main
INFO colossalai - colossalai - INFO: gpt2_medium, CAI_Gemini, batch size 8
** tp_pg ProcessGroup(ranks=[0, 1],
rank=1, dp=1, tp=2)
** tp_pg ProcessGroup(ranks=[0, 1],
rank=0, dp=1, tp=2)
OP colossalai._C.cpu_adam already exists, skip building.
Time to load cpu_adam op: 0.001771688461303711 seconds
OP colossalai._C.fused_optim already exists, skip building.
Time to load fused_optim op: 6.0558319091796875e-05 seconds
searching chunk configuration is completed in 2.49 s.
used number: 170.75 MB, wasted number: 0.00 MB
total wasted percentage is 0.00%
[02/22/23 15:15:19] INFO colossalai - colossalai - INFO: ./train_gpt_my.py:297 main
INFO colossalai - colossalai - INFO: the size of testing model size is 204.8M.
root /root/data/chatgpt_token
Environment
set -x
# distplan in ["CAI_ZeRO1", "CAI_ZeRO2", "CAI_Gemini", "Pytorch_DDP", "Pytorch_ZeRO"]
# export DISTPLAN=CAI_ZeRO2
export DISTPLAN=${DISTPLAN:-"CAI_Gemini"}
# The following options only valid when DISTPLAN="colossalai"
export GPUNUM=${GPUNUM:-2}
export TPDEGREE=${TPDEGREE:-2}
export PLACEMENT=${PLACEMENT:-"cpu"}
export USE_SHARD_INIT=${USE_SHARD_INIT:-False}
export BATCH_SIZE=${BATCH_SIZE:-8}
export MODEL_TYPE=${MODEL_TYPE:-"gpt2_medium"}
export TRAIN_STEP=${TRAIN_STEP:-10}
# export PYTHONPATH=$PWD:$PYTHONPATH
if [ ${USE_SHARD_INIT} = "True" ]; then
USE_SHARD_INIT="--shardinit"
else
USE_SHARD_INIT=""
fi
mkdir -p gemini_logs
torchrun --standalone --nproc_per_node=${GPUNUM} ./train_gpt_my.py \
--tp_degree=${TPDEGREE} \
--model_type=${MODEL_TYPE} \
--batch_size=${BATCH_SIZE} \
--placement=${PLACEMENT} \
${USE_SHARD_INIT} \
--distplan=${DISTPLAN} \
--train_step=${TRAIN_STEP} \
2>&1 | tee ./gemini_logs/${MODEL_TYPE}_${DISTPLAN}_gpu_${GPUNUM}_bs_${BATCH_SIZE}_tp_${TPDEGREE}_${PLACEMENT}.log