pytorch-ssd
pytorch-ssd copied to clipboard
run_ssd_live_demo.py: "RuntimeError: expected device cpu but got device cuda:0"
Hi,
I trained a mb2-ssd-lite model with a subset (just 1 class) of Open Images on just 20 epochs. I'm now attempting to run the live demo with this model:
$ python run_ssd_live_demo.py mb2-ssd-lite models/mb2-ssd-lite-Epoch-19-Loss-3.6359732536622036.pth models/open-images-model-labels.txt
And I get the runtime error
Traceback (most recent call last):
File "run_ssd_live_demo.py", line 65, in <module>
boxes, labels, probs = predictor.predict(image, 10, 0.4)
File "/ml/playground/pytorch-ssd/vision/ssd/predictor.py", line 37, in predict
scores, boxes = self.net.forward(images)
File "/ml/playground/pytorch-ssd/vision/ssd/ssd.py", line 93, in forward
locations, self.priors, self.config.center_variance, self.config.size_variance
File "/ml/playground/pytorch-ssd/vision/utils/box_utils.py", line 104, in convert_locations_to_boxes
locations[..., :2] * center_variance * priors[..., 2:] + priors[..., :2],
RuntimeError: expected device cpu but got device cuda:0
I can run the live demo on the pretrained model as per the README's instrucitons without error. Any ideas?
So it looks like the issue is the locations tensor being a CPU tensor and priors being a CUDA tensor. On line 93 of vision/ssd/ssd.py I made the following change:
locations.to(self.device), self.priors, self.config.center_variance, self.config.size_variance
Which gets the live demo to work. But because mb1-ssd seems to work fine, I believe the issue occurs at some point prior to this and that the above fix is more of a workaround. I haven't fully reviewed the entire code base to know if there is a better fix.
I ran into the same issue as @Jaftem. His workaround solved my problem. It seems that trying to convert models trained on GPUs does not work with the current code base.
Since the demo shows the inference on CPU, you either want to pass map_location='cpu'
here https://github.com/qfgaohao/pytorch-ssd/blob/7174f33aa2a1540f90d827d48dea681ec1a2856c/run_ssd_live_demo.py#L41
or explicitly move the model to gpu somewhere in run_ssd_live_demo.py
@Jaftem try changing line 50 on run_ssd_live_demo.py from this:
predictor = create_mobilenetv2_ssd_lite_predictor(net, candidate_size=200)
to this:
predictor = create_mobilenetv2_ssd_lite_predictor(net, candidate_size=200, device=torch.device('cuda'))
Initializing the Predictor class this way solves this issue without touching the SSD class. I don't know the reason why the CPU is default device of the mb2-ssd-lite Predictor, but this line of the README may be a clue:
You may notice MobileNetV2 SSD/SSD-Lite is slower than MobileNetV1 SSD/Lite on PC. However, MobileNetV2 is faster on mobile devices.
So, this model may be intended to be used in mobile devices.
As @Jaftem pointed out, this error is reproducible when running in a node with GPUs. Pytorch seems to be loading into GPU by default.
Running this is a CPU only node works.
Yep exactly, the issue comes from https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/ssd.py#L35 ssd model will be loaded using cuda device if available. ssd's constructor has a device parameter that is None by default. When loading the mobilenet_v2_ssd_lite network, ssd's constructor is called without any device at all, thus it will load with cuda if available. https://github.com/qfgaohao/pytorch-ssd/blob/master/vision/ssd/mobilenet_v2_ssd_lite.py#L58
On my side I just forced the device to be cpu in ssd because that is why I need. However a better solution would be to add a parameter device to the mobilenet_v2_ssd_lite so that it can specify a device
I got similar error when I run convert_to_caffe2_models.py to convert mobilenet_v2_ssd_lite model to onnx. In this case, @Jaftem 's solution helped me to get onnx, but not sure this is okay.