pytorch-ssd icon indicating copy to clipboard operation
pytorch-ssd copied to clipboard

run_ssd_live_demo.py: "RuntimeError: expected device cpu but got device cuda:0"

Open Jaftem opened this issue 5 years ago • 7 comments

Hi,

I trained a mb2-ssd-lite model with a subset (just 1 class) of Open Images on just 20 epochs. I'm now attempting to run the live demo with this model:

$ python run_ssd_live_demo.py mb2-ssd-lite models/mb2-ssd-lite-Epoch-19-Loss-3.6359732536622036.pth models/open-images-model-labels.txt 

And I get the runtime error

Traceback (most recent call last):
  File "run_ssd_live_demo.py", line 65, in <module>
    boxes, labels, probs = predictor.predict(image, 10, 0.4)
  File "/ml/playground/pytorch-ssd/vision/ssd/predictor.py", line 37, in predict
    scores, boxes = self.net.forward(images)
  File "/ml/playground/pytorch-ssd/vision/ssd/ssd.py", line 93, in forward
    locations, self.priors, self.config.center_variance, self.config.size_variance
  File "/ml/playground/pytorch-ssd/vision/utils/box_utils.py", line 104, in convert_locations_to_boxes
    locations[..., :2] * center_variance * priors[..., 2:] + priors[..., :2],
RuntimeError: expected device cpu but got device cuda:0

I can run the live demo on the pretrained model as per the README's instrucitons without error. Any ideas?

Jaftem avatar Dec 06 '19 17:12 Jaftem

So it looks like the issue is the locations tensor being a CPU tensor and priors being a CUDA tensor. On line 93 of vision/ssd/ssd.py I made the following change:

    locations.to(self.device), self.priors, self.config.center_variance, self.config.size_variance

Which gets the live demo to work. But because mb1-ssd seems to work fine, I believe the issue occurs at some point prior to this and that the above fix is more of a workaround. I haven't fully reviewed the entire code base to know if there is a better fix.

Jaftem avatar Dec 06 '19 18:12 Jaftem

I ran into the same issue as @Jaftem. His workaround solved my problem. It seems that trying to convert models trained on GPUs does not work with the current code base.

hsahovic avatar Dec 12 '19 08:12 hsahovic

Since the demo shows the inference on CPU, you either want to pass map_location='cpu' here https://github.com/qfgaohao/pytorch-ssd/blob/7174f33aa2a1540f90d827d48dea681ec1a2856c/run_ssd_live_demo.py#L41 or explicitly move the model to gpu somewhere in run_ssd_live_demo.py

vladserkoff avatar Dec 12 '19 12:12 vladserkoff

@Jaftem try changing line 50 on run_ssd_live_demo.py from this:

predictor = create_mobilenetv2_ssd_lite_predictor(net, candidate_size=200)

to this:

predictor = create_mobilenetv2_ssd_lite_predictor(net, candidate_size=200, device=torch.device('cuda'))

Initializing the Predictor class this way solves this issue without touching the SSD class. I don't know the reason why the CPU is default device of the mb2-ssd-lite Predictor, but this line of the README may be a clue:

You may notice MobileNetV2 SSD/SSD-Lite is slower than MobileNetV1 SSD/Lite on PC. However, MobileNetV2 is faster on mobile devices.

So, this model may be intended to be used in mobile devices.

TheCamilovisk avatar Dec 12 '19 15:12 TheCamilovisk

As @Jaftem pointed out, this error is reproducible when running in a node with GPUs. Pytorch seems to be loading into GPU by default.

Running this is a CPU only node works.

MADHAVAN001 avatar Dec 16 '19 20:12 MADHAVAN001

Yep exactly, the issue comes from https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/ssd.py#L35 ssd model will be loaded using cuda device if available. ssd's constructor has a device parameter that is None by default. When loading the mobilenet_v2_ssd_lite network, ssd's constructor is called without any device at all, thus it will load with cuda if available. https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/mobilenet_v2_ssd_lite.py#L58

On my side I just forced the device to be cpu in ssd because that is why I need. However a better solution would be to add a parameter device to the mobilenet_v2_ssd_lite so that it can specify a device

YaYaB avatar Apr 10 '20 16:04 YaYaB

I got similar error when I run convert_to_caffe2_models.py to convert mobilenet_v2_ssd_lite model to onnx. In this case, @Jaftem 's solution helped me to get onnx, but not sure this is okay.

AiueoABC avatar Jul 29 '20 06:07 AiueoABC