instant-ngp icon indicating copy to clipboard operation
instant-ngp copied to clipboard

Reloading a new dataset after training on a first one yields a RuntimeError: illegal memory access

Open vincentcartillier opened this issue 2 years ago • 0 comments

Hello again,

I am trying to train a Nerf model on two datasets representing the same scene with different images. I would like to load the first dataset, train Nerf, then load the second dataset (without re-initializating my Nerf model) and train on that second dataset.

The following code snippet returns a RuntimeError:

import numpy as np
import pyngp as ngp # noqa

# -- create model
testbed = ngp.Testbed(ngp.TestbedMode.Nerf)
network = "configs/nerf/base.json"
testbed.reload_network_from_file(network)
aabb_scale=4
scale=0.33333
fx=100
fy=100
cx=100
cy=100
k1=0.0
k2=0.0
p1=0.0
p2=0.0

rgb = np.ones((200,200,4), dtype=np.uint8)
depth = np.ones((200,200), dtype=np.float32)
c2w = np.eye(4)

testbed.create_empty_nerf_dataset(1, aabb_scale, False)
testbed.nerf.training.set_image(0, rgb, depth, scale)
testbed.nerf.training.set_camera_intrinsics(0, fx, fy, cx, cy, k1, k2, p1, p2)
testbed.nerf.training.set_camera_extrinsics(0, c2w[:3,:], True)

testbed.nerf.training.n_images_for_training = 1
testbed.shall_train = True
batch_size=256000
for i in range(200):
    testbed.train(batch_size)

testbed.clear_training_data()

# load second dataset
testbed.create_empty_nerf_dataset(2, aabb_scale, False)
for i in range(2):
    testbed.nerf.training.set_image(i, rgb, depth, scale)
    testbed.nerf.training.set_camera_intrinsics(i, fx, fy, cx, cy, k1, k2, p1, p2)
    testbed.nerf.training.set_camera_extrinsics(i, c2w[:3,:], True)

testbed.nerf.training.n_images_for_training = 2
testbed.shall_train = True
batch_size=256000
for i in range(200):
    testbed.train(batch_size)

It returns:

Traceback (most recent call last):
  File "debug_snippet.py", line 47, in <module>
    testbed.train(batch_size)
RuntimeError: ~/instant-ngp/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/gpu_memory.h:285 cudaMemcpy(host_data, data(), num_elements * sizeof(T), cudaMemcpyDeviceToHost) 
failed with error an illegal memory access was encountered

Could not free memory: ~/instant-ngp/dependencies/tiny-cuda-nn/include/tiny-cuda-nn/gpu_memory.h:142 cudaFree(rawptr) 
failed with error an illegal memory access was encountered

After some debugging I found that the issue might be coming at this line.

I also found that if I call reset() before training a second time things work. However, I do not wish to reset my entire Nerf model (I hope to finetune it on the second dataset).

Do you have any pointers on how I can achieve this?

vincentcartillier avatar Oct 05 '22 17:10 vincentcartillier