drjit DrJit fills up all available memory when creating arrays from Numpy

Hi,

I encountered an edge case in DrJit's memory allocation routines. It seems that when creating DrJit arrays from numpy arrays repeatedly, DrJit keeps increasing its memory pool instead of correctly reusing the allocated memory.

Here is a minimal reproducer:

import numpy as np
import drjit as dr

np_array = np.random.rand(2 ** 20)
for i in range(10000):
  drjit_array = dr.llvm.Float(np_array)
  dr.eval(drjit_array)
  del drjit_array
  print('\n'.join(dr.whos_str().split('\n')[-30:]))

The issue doesn't happen when using dr.llvm.Float(np_array.tolist()).

Looking at the code, it seems that the problem is related jitc_malloc_migrate, invoked in jitc_var_mem_copy. What happens is that DrJit originally allocated memory as AllocType::Host, but then converts it to AllocType::HostAsync. This means when jitc_free is called, the memory is returned to the pool for AllocType::HostAsync allocations. However, in the next iteration of the loop, a new variable is created using AllocType::Host. Since no memory was ever given back to this pool, new memory is allocated and then again moved to the HostAsync pool.

In principle DrJit will at some point free up its memory. But it's still undesirable for it to needlessly reserve the full system memory even if unnecessary.

Apr 03 '23 15:04 dvicini

Having the same issue. Bumping for visibility. I have a situation where I cannot run a drjit loop so i run a normal python loop and dr.cuda.ad.Float arrays (specifically created using dr.meshgrid from numpy objects) continue to grow each iteration rather than releasing and overwriting. I have traced it to the drjit arrays growing each iteration using multiple memory profiling tools.

Jun 15 '24 19:06 tloobyCFS

I think the original issue above has been fixed in Dr.Jit 1.0. If I run the above code using the latest Dr.Jit, it correctly frees up any potential allocations both in LLVM and CUDA mode. More precisely, the LLVM mode does not allocate anything as it directly accesses the Numpy array's memory. The CUDA mode does allocate a buffer, but that one is correctly de-allocated in every iteration.

Oct 07 '24 13:10 dvicini