pytorch-cifar icon indicating copy to clipboard operation
pytorch-cifar copied to clipboard

Errors when testing on CPU

Open bryanbocao opened this issue 2 years ago • 11 comments

layer_name: <class 'torch.nn.modules.conv.Conv2d'>, total_params: 15121584, total_traina_params: 15121584, n_layers: 39
device:  cpu
Traceback (most recent call last):
  File "main.py", line 208, in <module>
    test(epoch)
  File "main.py", line 189, in test
    outputs = net(inputs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Repos/pytorch-cifar/models/dla_simple.py", line 106, in forward
    out = self.base(x)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/brcao/Apps/anaconda3/envs/yolo/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_conv2d_forward

bryanbocao avatar Jun 24 '22 20:06 bryanbocao

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

logan-mo avatar Aug 06 '22 17:08 logan-mo

@Phillibob55 That's a different problem. The current issue is not about CUDA driver installation/configurations. I can run it on GPU but I intentionally wanted to test the CPU runtime, which requires both the models and data to be in CPU memory instead of GPU.

The model is trained and saved on GPU memory, need to add map_location=device argument when loading the model where device='cpu' in order to run the model on CPU. I've solved this issue by

parser.add_argument('--select_device', type=str, default='gpu', help='gpu | cpu')
...
device = 'cuda' if torch.cuda.is_available() and args.select_device == 'gpu' else 'cpu'
...
checkpoint = torch.load('./checkpoint/{}_ckpt.pth'.format(args.net), map_location=device)

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

bryanbocao avatar Aug 06 '22 19:08 bryanbocao

@Phillibob55 Check this out https://github.com/kuangliu/pytorch-cifar/pull/152 https://github.com/kuangliu/pytorch-cifar/pull/152/commits/b782bba05821e2a31871b46adf9d33da3b00e036

bryanbocao avatar Aug 06 '22 19:08 bryanbocao

@Phillibob55 That's a different problem. The current issue is not about CUDA driver installation/configurations. I can run it on GPU but I intentionally wanted to test the CPU runtime, which requires both the models and data to be in CPU memory instead of GPU.

The model is trained and saved on GPU memory, need to add map_location=device argument when loading the model where device='cpu' in order to run the model on CPU. I've solved this issue by

parser.add_argument('--select_device', type=str, default='gpu', help='gpu | cpu')
...
device = 'cuda' if torch.cuda.is_available() and args.select_device == 'gpu' else 'cpu'
...
checkpoint = torch.load('./checkpoint/{}_ckpt.pth'.format(args.net), map_location=device)

@bryanbocao Your device is not being set to the GPU. Can you make sure if your Cuda drivers are properly installed and all models and datasets are being sent to the GPU memory?

ooh, yeah. Makes sense that way. OC seems to be inactive. So I'm working on my own version of this, which runs on any image dataset and doesn't have these problems, etc.

logan-mo avatar Aug 07 '22 03:08 logan-mo

@bryanbocao Can you kinda guide to make these models work with image sizes other than 32x32?

logan-mo avatar Aug 07 '22 04:08 logan-mo

@Phillibob55 I am happy to work on that. Do you mean (1) simply resize any arbitrary images into 32x32 resolution and feed them into these models? It can simply be done by adding one more argument in the command line and resize method in the code. or (2) prepare a set of models whose direct input shape is different from 32x32.

bryanbocao avatar Aug 07 '22 05:08 bryanbocao

@bryanbocao I started off with the first approach and just added a resize transform, but that loses a lot of information. For datasets like ImageNet, this doesn't give accuracy above 50%. So I was thinking maybe if I try to make the models accept images of any size, it might give me better results, taking more time training of course.

The codebase on the repo has a lot of hardcoded elements. I combated the 10 output classes by adding an argument for number of classes in every model class. But I don't have the knowledge to know what's going on in the complex models to modify them to be able to accept images of any size.

P.S, I'm using a Jupyter notebook instead of my main.py

logan-mo avatar Aug 07 '22 08:08 logan-mo

I've created a repo for it here

logan-mo avatar Aug 07 '22 11:08 logan-mo

@bryanbocao I started off with the first approach and just added a resize transform, but that loses a lot of information. For datasets like ImageNet, this doesn't give accuracy above 50%. So I was thinking maybe if I try to make the models accept images of any size, it might give me better results, taking more time training of course.

The codebase on the repo has a lot of hardcoded elements. I combated the 10 output classes by adding an argument for number of classes in every model class. But I don't have the knowledge to know what's going on in the complex models to modify them to be able to accept images of any size.

P.S, I'm using a Jupyter notebook instead of my main.py

@Phillibob55 Sounds good. If you would like to create an easy-to-use repo that we can just change some arguments to train and test many different models, I am happy to contribute in my spare time. I have forked you repo to https://github.com/bryanbocao/image-classification

bryanbocao avatar Aug 22 '22 21:08 bryanbocao

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progress_bar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in _, term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

CopyABCs avatar Mar 03 '23 12:03 CopyABCs

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progress_bar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in _, term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

Hello, I would like to ask you what caused the following error after running, and how to deal with it:

D:\anaconda3\envs\datudui\python.exe C:/Users/52254/Desktop/pytorch-cifar-master/main.py 'stty' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "C:/Users/52254/Desktop/pytorch-cifar-master/main.py", line 15, in from utils import progress_bar File "C:\Users\52254\Desktop\pytorch-cifar-master\utils.py", line 45, in _, term_width = os.popen('stty size', 'r').read().split() ValueError: not enough values to unpack (expected 2, got 0)

I have the same problem as you.

Selfpline6 avatar Mar 09 '23 02:03 Selfpline6