collection-of-notebooks
collection-of-notebooks copied to clipboard
Text2PixelArt - radix_sort bug
I have successfully installed all the required components, but when I run the generation, I get an error:
runtime error: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
I tried to fix the error by downgrading the pytorch library version (1.9 -> 1.8; 1.7), but it didn't help
Here is full error log:
Oops: runtime error: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
Try reducing --num-cuts to save memory
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-54661c00350f> in <module>()
56 clipit.do_init(settings)
57 clear_output()
---> 58 clipit.do_run(settings)
/content/clipit/clipit.py in do_run(args)
900 print("Oops: runtime error: ", e)
901 print("Try reducing --num-cuts to save memory")
--> 902 raise e
903 except KeyboardInterrupt:
904 pass
/content/clipit/clipit.py in do_run(args)
892 while True:
893 try:
--> 894 train(args, cur_iteration)
895 if cur_iteration == args.iterations:
896 break
/content/clipit/clipit.py in train(args, cur_it)
812
813 loss = sum(lossAll)
--> 814 loss.backward()
815 for opt in opts:
816 opt.step()
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
253 create_graph=create_graph,
254 inputs=inputs)
--> 255 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
256
257 def register_hook(self, hook):
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
147 Variable._execution_engine.run_backward(
148 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149 allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
150
151
/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py in apply(self, *args)
85 def apply(self, *args):
86 # _forward_cls is defined by derived class
---> 87 return self._forward_cls.backward(self, *args) # type: ignore[attr-defined]
88
89
/usr/local/lib/python3.7/dist-packages/diffvg-0.0.1-py3.7-linux-x86_64.egg/pydiffvg/render_pytorch.py in backward(ctx, grad_img)
707 use_prefiltering,
708 diffvg.float_ptr(eval_positions.data_ptr()),
--> 709 eval_positions.shape[0])
710 time_elapsed = time.time() - start
711 global print_timing
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
Hi @DenCoder618, same error here on today's run, though yesterday night the notebook was running fine.
Seems like it depends on what GPU is assigned in collab. K80 does not work.
@DenCoder618 I've tested it today, indeed it's K80 problem. To check which GPU is used type !nvidia-smi -L
in colab, thanks @tg-bomze for the hint.
It worked for me on T4 and P100 / V100 for @tg-bomze
Any way to make sure not to get a K80? !nvidia-smi -L
returns GPU 0: Tesla K80 (UUID: GPU-adea173e-18bb-0a5a-ac56-ad8ee38d38e0)
. Collab Pro is the only way?
Collab Pro is the only way?
If you do not have a subscription to Colab Pro, you can simply reset the runtime and if you are lucky, next time a supported video card will drop out. For me it worked several times, but I had to wait for half an hour