SMAP icon indicating copy to clipboard operation
SMAP copied to clipboard

Question about batch size

Open xobeiotozi opened this issue 3 years ago • 7 comments

I have read your paper carefully. It mentioned that the batch size is set to 32, but I only see solver. IMG_PER_GPU = 2 in train.py. Is this a change in the code for a GPU training? Thanks a lot for your time and reading

xobeiotozi avatar Jul 26 '21 05:07 xobeiotozi

The batch size is calculated in the setting of multi-gpu DistributedDataParallel training.

raypine avatar Jul 26 '21 07:07 raypine

I only have one GPU, can I only set IMG_PER_GPU = 1?

xobeiotozi avatar Jul 27 '21 10:07 xobeiotozi

How do I solve this problem? Is it because I only have one GPU? raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'train.py', '--local_rank=0']' died with <Signals.SIGKILL: 9>.

xobeiotozi avatar Aug 03 '21 07:08 xobeiotozi

What is your setting of "nproc_per_node" ?

raypine avatar Aug 03 '21 07:08 raypine

I’m setting nproc_per_node=1

xobeiotozi avatar Aug 03 '21 07:08 xobeiotozi

It may be a problem. An easy solution is to call "train.py" directly rather than using "torch.distributed.launch".

raypine avatar Aug 03 '21 08:08 raypine

It also have a problem.Is these something wrong with train.py?

2021-08-03 16:40:36 node02 root[2842] INFO using devices 0 train.sh: line 5: 2842 Killed python train.py

xobeiotozi avatar Aug 03 '21 08:08 xobeiotozi