nerfacc icon indicating copy to clipboard operation
nerfacc copied to clipboard

Runtime error in cudaGraphExecUpdate() from tiny-cuda-nn

Open morsingher opened this issue 2 years ago • 1 comments

Hi, I always get a weird error after some thousands of iteration when running this example, or the examples from this other repository:

terminate called after throwing an instance of 'std::runtime_error' what(): /tmp/pip-req-build-z4954kz1/include/tiny-cuda-nn/cuda_graph.h:124 cudaGraphExecUpdate(m_graph_instance, m_graph, &error_node, &update_result) failed with error the graph update was not performed because it included changes which violated constraints specific to instantiated graph update Aborted

After some debugging, I can say that it is not related to tiny-cuda-nn itself, as I can execute smoothly their training example. Also, the error disappears if I just replace the RGB output from your rendering function with random values. I'm using PyTorch 1.13 with CUDA 11.6 and V100 cards. Another weird thing is that this error doesn't show up with Titan Xp cards (and the same PyTorch/CUDA versions).

Do you have any idea why this happens and how to solve it? Thank you in advance!

morsingher avatar Nov 30 '22 14:11 morsingher

Seems like it's an hardware related issue?

I have no clue what could be the cause out of my head. But I believe if you replace the tiny-cuda-nn with a normal mlp, it should not have this issue. If that's the case it would still be somewhat related to tiny-cuda-nn.

Helps needed.

liruilong940607 avatar Dec 08 '22 02:12 liruilong940607