neural-style-pt
neural-style-pt copied to clipboard
RuntimeError: CUDA error: invalid device ordinal with starry_stanford.sh
Hi, I'm trying to run the script above to see if my system can handle and create larger images based upon your script.
I added -optimizer adam and using the NIN model for lower memory gpus.
Here is my output that fails eventually...
RuntimeError: CUDA error: invalid device ordinal
NIN Architecture Detected
Successfully loaded models/nin_imagenet.pth
conv1: 96 3 11 11
cccp1: 96 96 1 1
cccp2: 96 96 1 1
conv2: 256 96 5 5
cccp3: 256 256 1 1
cccp4: 256 256 1 1
conv3: 384 256 3 3
cccp5: 384 384 1 1
cccp6: 384 384 1 1
conv4-1024: 1024 384 3 3
cccp7-1024: 1024 1024 1 1
cccp8-1024: 1000 1024 1 1
Traceback (most recent call last):
File "/home/gateway/work/neural-style-software/neural-style-pt/neural_style.py", line 468, in <module>
main()
File "/home/gateway/work/neural-style-software/neural-style-pt/neural_style.py", line 157, in main
net = setup_multi_device(net)
File "/home/gateway/work/neural-style-software/neural-style-pt/neural_style.py", line 328, in setup_multi_device
new_net = ModelParallel(net, params.gpu, params.multidevice_strategy)
File "/home/gateway/work/neural-style-software/neural-style-pt/CaffeLoader.py", line 110, in __init__
self.chunks = self.chunks_to_devices(self.split_net(net, device_splits.split(',')))
File "/home/gateway/work/neural-style-software/neural-style-pt/CaffeLoader.py", line 134, in chunks_to_devices
chunk.to(self.device_list[i])
File "/home/gateway/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 426, in to
return self._apply(convert)
File "/home/gateway/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 202, in _apply
module._apply(fn)
File "/home/gateway/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 224, in _apply
param_applied = fn(param)
File "/home/gateway/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 424, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
https://github.com/ProGamerGov/neural-style-pt/blob/master/examples/scripts/starry_stanford.sh
Nvidia info
btw I'm using GPU 1 since it has the most memory and not using the primary display..
(base) gateway@gateway-media:~/work/neural-style-software/neural-style-pt$ nvidia-smi
Tue May 5 14:34:21 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A |
| 0% 54C P8 4W / 120W | 195MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A |
| 21% 56C P8 6W / 180W | 2MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2161 G /usr/lib/xorg/Xorg 101MiB |
| 0 2661 G 11MiB |
| 0 2856 G /usr/bin/gnome-shell 77MiB |
+-----------------------------------------------------------------------------+
(base) gateway@gateway-media:~/work/neural-style-software/neural-style-pt$
thoughts?
First you should check if PyTorch sees your devices correctly and that CUDA works. Try running this in the Python interpreter and seeing what it shows:
import torch
torch.__version__ # Get PyTorch and CUDA version
torch.cuda.is_available() # Check that CUDA works
torch.cuda.device_count() # Check how many CUDA capable devices you have
# Print device human readable names
torch.cuda.get_device_name(0)
torch.cuda.get_device_name(1)
# Add more lines with +1 like get_device_name(3), get_device_name(4) if you have more devices.
If the devices exist and CUDA works, then it's probably just an issue with the ID you are using. CUDA can sometimes be a bit weird with how it sets GPU IDs: https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus
You fix the GPU device order by CUDA_DEVICE_ORDER=PCI_BUS_ID before the command:
CUDA_DEVICE_ORDER=PCI_BUS_ID python3 neural_style.py
You can also use CUDA_VISIBLE_DEVICES before the command to make sure that PyTorch can only see the specified device:
# Only make GPU ID 1 visible to PyTorch
CUDA_VISIBLE_DEVICES=1 python3 neural_style.py
ahh.. never knew that about PyTorch, it seems that the device id's compared to what nvidia-smi are swapped.
>>> torch.cuda.get_device_name(0)
'GeForce GTX 1080'
>>> torch.cuda.get_device_name(1)
'GeForce GTX 1060 6GB'
hmm so in my case adding maybe this. CUDA_DEVICE_ORDER=0 python3 neural_style.py would be the 1060, and CUDA_DEVICE_ORDER=1 python3 neural_style.py should be the 1080?
should I make any changed to the GPU value in the script? Thanks for your timely response.. btw has anyone used your version of style transfer for video?
The invalid device ordinal is error is normally given when you specify a non existent GPU ID.
The GPU value in the script should be set to the PyTorch GPU ID that you want to use as PyTorch shows the device you want to use as having an ID of 0. The order and GPU values available to PyTorch will change based on the CUDA environment variables you specify.
CUDA_DEVICE_ORDER=PCI_BUS_ID will swap the GPU order if the existing order is not based on the PCI Bus order.
CUDA_VISIBLE_DEVICES=1 will make GPU 0 in PyTorch be your second GPU.
CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1 will only give PyTorch the second GPU device based on the PCI Bus order, but that second GPU will listed as GPU 0 so you'll need to use -gpu 0.
btw has anyone used your version of style transfer for video?
Yes, but those individuals tend to use techniques like rotoscoping to create video to avoid the flicking effect. I'm not knowledgeable enough yet to translate artistic-videos to PyTorch. But it should easier for someone who better understands the video aspect of the code, as both artistic-videos and neural-style-pt are based on the same original code (neural-style).
Basically this is what neural-style-pt does with GPU IDs (example with the Python Interpreter):
import torch
a = torch.randn(3)
a.to('cpu') # Puts tensor 'a' on the CPU if it wasn't already
a.to('cuda:0') # Puts tensor 'a' on device 0
a.to('cuda:1') # Puts tensor 'a' on device 1
When I specify a valid GPU, I get something like this:
>>> a.to('cuda:0')
tensor([ 0.8459, -0.2027, 0.6153], device='cuda:0')
And when I specify a GPU that doesn't exist on my computer, I get this:
>>> a.to('cuda:1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: invalid device ordinal