nnest
nnest copied to clipboard
CUDA capabilities
Hi again,
I was wondering if you tested this code with CUDA? In the examples it seems GPUs are explicitly disabled via an environment variable. When I remove this restriction, I get torch errors related to whether an object lives on the GPU or CPU.
Some examples:
Traceback (most recent call last):
File "examples/mcmc/rosenbrock.py", line 55, in <module>
main(args)
File "examples/mcmc/rosenbrock.py", line 30, in main
bootstrap_iters=args.burnin_iters, bootstrap_mcmc_steps=5000 + 1000 * args.x_dim)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/mcmc.py", line 169, in run
self.trainer.train(samples, max_iters=train_iters, noise=-1)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/multitrainer.py", line 171, in train
epoch, self.netG, train_loader, noise=training_noise)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/trainer.py", line 380, in _train
loss = -model.log_probs(data, cond_data).mean()
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 202, in log_probs
return self.net.log_probs(inputs, cond_inputs=None)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 165, in log_probs
u, log_jacob = self(inputs, cond_inputs)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 155, in forward
inputs, logdet = module(inputs, cond_inputs, mode)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 119, in forward
masked_inputs = inputs * mask
RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float
Traceback (most recent call last):
File "examples/mcmc/rosenbrock.py", line 55, in <module>
main(args)
File "examples/mcmc/rosenbrock.py", line 30, in main
bootstrap_iters=args.burnin_iters, bootstrap_mcmc_steps=5000 + 1000 * args.x_dim)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/mcmc.py", line 169, in run
self.trainer.train(samples, max_iters=train_iters, noise=-1)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/multitrainer.py", line 171, in train
epoch, self.netG, train_loader, noise=training_noise)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/trainer.py", line 380, in _train
loss = -model.log_probs(data, cond_data).mean()
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 202, in log_probs
return self.net.log_probs(inputs, cond_inputs=None)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 165, in log_probs
u, log_jacob = self(inputs, cond_inputs)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 155, in forward
inputs, logdet = module(inputs, cond_inputs, mode)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/data/daten/PostDoc2/home/Downloads/nnest/nnest/networks.py", line 124, in forward
log_s = self.scale_net(masked_inputs) * (1 - mask)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 92, in forward
return F.linear(input, self.weight, self.bias)
File "/home/user/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1406, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'
I wanted to test with CUDA to see how much faster the training is.
Thanks Johannes. This is indeed a bug and I'm currently working on a fix to make it CUDA compatible. Thanks also for your other PR - will be taking a look at that very soon.
Cheers, Adam
Hi @JohannesBuchner , this should now be fixed in https://github.com/adammoss/nnest/commit/a4fca887c7f69a3cc18819ba5632925a02a0ab36. I'm actually getting better performance on the CPU. My GPU isn't the latest a (1080), but it could be this isn't as well suited to a GPU as things are frequently moved back and forth between the CPU and CPU. I'd be interested if you get similar results.
Cheers, Adam
Hi @adammoss, I tried to test the difference of using CPU and GPU, but I keep getting this error at the time of the training when I set 'use_gpu=True' :
Traceback (most recent call last):
File "run.py", line 81, in
Is it necessary to do something extra to use the GPU?