batch_size

Open SISTMrL opened this issue 2 years ago • 1 comments

hello, when i set 2 to sampler_per_gpu in /projects/configs/surroundocc/surroundocc.py，the problem is shown as follows: RuntimeError: stack expects each tensor to be equal size, but got [62812, 4] at entry 0 and [43226, 4] at entry 1

this is my training shell code

CONFIG=./projects/configs/surroundocc/surroundocc.py GPUS=2 SAVE_PATH=./work_dirs/surroundocc PORT=${PORT:-28108} NCCL_DEBUG=INFO

PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT
$(dirname "$0")/train.py $CONFIG --work-dir ${SAVE_PATH} --launcher pytorch ${@:4} --deterministic

distributed training, 2gpus i used.

looking forward to your reply, thanks!

Apr 04 '23 08:04 SISTMrL

Hi, we do not test the case that sample_per_gpu > 1 and in our experiment we fix sample_pre_gpu as 1. I think the bug maybe caused by dataloader since the the shape of occupancy ground truth is different in different sample.

Apr 05 '23 05:04 weiyithu