collection-of-notebooks icon indicating copy to clipboard operation
collection-of-notebooks copied to clipboard

Text2PixelArt - radix_sort bug

Open potat-dev opened this issue 3 years ago • 5 comments

I have successfully installed all the required components, but when I run the generation, I get an error:

runtime error:  radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function

I tried to fix the error by downgrading the pytorch library version (1.9 -> 1.8; 1.7), but it didn't help

Here is full error log:

Oops: runtime error:  radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function
Try reducing --num-cuts to save memory
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-54661c00350f> in <module>()
     56 clipit.do_init(settings)
     57 clear_output()
---> 58 clipit.do_run(settings)

/content/clipit/clipit.py in do_run(args)
    900                         print("Oops: runtime error: ", e)
    901                         print("Try reducing --num-cuts to save memory")
--> 902                         raise e
    903         except KeyboardInterrupt:
    904             pass

/content/clipit/clipit.py in do_run(args)
    892                 while True:
    893                     try:
--> 894                         train(args, cur_iteration)
    895                         if cur_iteration == args.iterations:
    896                             break

/content/clipit/clipit.py in train(args, cur_it)
    812 
    813     loss = sum(lossAll)
--> 814     loss.backward()
    815     for opt in opts:
    816         opt.step()

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    253                 create_graph=create_graph,
    254                 inputs=inputs)
--> 255         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    256 
    257     def register_hook(self, hook):

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    150 
    151 

/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py in apply(self, *args)
     85     def apply(self, *args):
     86         # _forward_cls is defined by derived class
---> 87         return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
     88 
     89 

/usr/local/lib/python3.7/dist-packages/diffvg-0.0.1-py3.7-linux-x86_64.egg/pydiffvg/render_pytorch.py in backward(ctx, grad_img)
    707                       use_prefiltering,
    708                       diffvg.float_ptr(eval_positions.data_ptr()),
--> 709                       eval_positions.shape[0])
    710         time_elapsed = time.time() - start
    711         global print_timing

RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function

potat-dev avatar Aug 18 '21 18:08 potat-dev

Hi @DenCoder618, same error here on today's run, though yesterday night the notebook was running fine.

dimithras avatar Aug 20 '21 15:08 dimithras

Seems like it depends on what GPU is assigned in collab. K80 does not work.

dimithras avatar Aug 20 '21 16:08 dimithras

@DenCoder618 I've tested it today, indeed it's K80 problem. To check which GPU is used type !nvidia-smi -L in colab, thanks @tg-bomze for the hint.

It worked for me on T4 and P100 / V100 for @tg-bomze

dimithras avatar Aug 21 '21 10:08 dimithras

Any way to make sure not to get a K80? !nvidia-smi -L returns GPU 0: Tesla K80 (UUID: GPU-adea173e-18bb-0a5a-ac56-ad8ee38d38e0). Collab Pro is the only way?

DennisGaida avatar Aug 26 '21 21:08 DennisGaida

Collab Pro is the only way?

If you do not have a subscription to Colab Pro, you can simply reset the runtime and if you are lucky, next time a supported video card will drop out. For me it worked several times, but I had to wait for half an hour

potat-dev avatar Aug 27 '21 07:08 potat-dev