caffe-jacinto-models icon indicating copy to clipboard operation
caffe-jacinto-models copied to clipboard

Get error when use 3 or 4 gpus to train model

Open yumeihong opened this issue 4 years ago • 2 comments

I installed NCCL to use more gpus to train model. install step:

  1. git clone https://github.com/NVIDIA/nccl.git
  2. cd nccl
  3. sudo make install -j8
  4. remove Makefile.config USE_NCCL comment

When I train model I use below instruction: $CAFFE_ROOT/build/tools/caffe train --solver="models/ssd/${PROJECT}/initial/solver.prototxt" --weights="models/ssd/${PROJECT}/initial/${PRETRAINED}" -gpu 0,1,2

it get error: image

image

But I can use 2 gpus to train. Did I loss something instruction?

yumeihong avatar Mar 20 '20 02:03 yumeihong

Hi, I think I did not understand the problem that you are facing. Specifically: Yow wrote: "But I can use 2 gpus to train." If you can use 2 gpus to train, then what is the issue?

mathmanu avatar Mar 20 '20 04:03 mathmanu

Hi, I think I did not understand the problem that you are facing. Specifically: Yow wrote: "But I can use 2 gpus to train." If you can use 2 gpus to train, then what is the issue?

Hi ,I get error message when I use 3 or 4 gpus. But I can use 2 gpus to train.

yumeihong avatar Mar 20 '20 06:03 yumeihong