manifold icon indicating copy to clipboard operation
manifold copied to clipboard

Memory leak problem

Open chpark1111 opened this issue 2 years ago • 19 comments

Hi, I was using this library to do following operations using the python bindings provided by this repository.

I think the memory leaks when the mesh triangularization fails.. Is there a way to fix this?

chpark1111 avatar Oct 15 '22 06:10 chpark1111

Hi, it would be nice if you can give a minimal example showing the leak.

pca006132 avatar Oct 16 '22 00:10 pca006132

manifolds = []
for i in range(50):
    pts = abs(np.random.normal(0, 0.1, 3))
    length = np.random.normal(0, 90, 3)
    manifolds.append(
        Manifold.cube(pts[0], pts[1], pts[2]).rotate(length[0], length[1], length[2])
    )

@profile(precision=5)
def my_func():
    c = manifolds[0]
    for i in range(1, len(manifolds)):
        c = c + manifolds[i]
    st = time.time()
    process = psutil.Process(os.getpid())
    print("1", process.memory_info().rss)
    for i in range(50):
        try: 
            mesh = c.to_mesh()
        except RuntimeError:
            pass
    process = psutil.Process(os.getpid())
    print("2", process.memory_info().rss)
    print("mesh vol", time.time() - st)
for k in range(10000):
    my_func()

When I run this code and compile it with debug=OFF, the mesh vol time printed out increases and the memory usage keep increases. So, I thought this was because of the memory leak.. (I checked that memory increase was coming from to_mesh function) Do you know why the time/memory increase?

chpark1111 avatar Oct 16 '22 04:10 chpark1111

Thanks for the repro! Our python bindings are still a bit alpha, so I imagine we have a few things to fix yet. Since I'm not so familiar with python, where would you expect the memory to be cleaned up? Given your recursive function, it seems like a new c would stay in memory for each invocation of my_func until termination (I'm assuming here that each invocation of my_func gets its own execution context, which is how it works in other languages). Does the behavior change if my_func is called in a for loop instead?

elalish avatar Oct 16 '22 15:10 elalish

Nope! It shows the same behavior I just functioned (instead of direct for loop) it to use the memory profiler which is one of the python libraries. I debuged and found out that if we don't use the c.to_mesh(), the memory usages does not increase. So, I'm suspecting the C++ implementation with GetMesh have some leaks. Also, I guess it's not the fault of the pybind module because if I call other things or return other variables without using the manifold::GetMesh function, it does not behave like that. When I comment out the part CsgOpNode::BatchUnion at the std::shared_ptr<CsgLeafNode> CsgOpNode::ToLeafNode() function, the memory leak does not happen. So, I guess these parts + triangularization part can be the problem. Also, when compiling this with CUDA support and run test/manifold provided by the repo and my code, it kind of leaves some unfreed memories in the GPU. So maybe this can be related to my issue. (The unfreed part is the part where it can use the GPU) And I'm curious how the program behaves if the triangularization fails. (Outputs RunTimeERROR with DEBUG option on) If we turn off the debug option and run the same thing, does the skipped vertices gets freed? Or how is the skipped vertices handled?

chpark1111 avatar Oct 16 '22 15:10 chpark1111

I think the skipped vertices should be freed automatically as they are being stored in std::list, and some for other data structure. It might be possible that my ManagedVec implementation may leak some memory but I am not entirely sure. Maybe we can try to reproduce this in C++ directly and use valgrind to have a look at the leak.

pca006132 avatar Oct 16 '22 16:10 pca006132

Thanks for the suggestion! I'll try to make them with C++ and check if it reproduces the result and the leaks. If it does, I'll have to find it out on the c++ codes and if not, it's the problem of the python bindings which I'll try to solve.

chpark1111 avatar Oct 16 '22 16:10 chpark1111

I think I found the problem! I think it's not the leaks but the wrong behavior of the library.

int main(int argc, char **argv) {
	random_device rd;
	default_random_engine gen{ rd() };

	uniform_real_distribution<double> len_dist{0, 0.1};
	uniform_int_distribution<int> rot_dist{-180, 180};

	vector<manifold::Manifold> manifolds;
	for(int i=0;i<50;i++) {
		double len1 = len_dist(gen), len2 = len_dist(gen), len3 = len_dist(gen);
		int rot1 = rot_dist(gen), rot2 = rot_dist(gen), rot3 = rot_dist(gen);

		manifolds.push_back(manifold::Manifold::Cube(glm::vec3(len1, len2, len3)));
	}

	for (int k=0;k<200;k++) {
		manifold::Manifold c = manifolds[0];
		for(int i=1;i<50;i++)
			c = c + manifolds[i];

		for(int i=1;i<100;i++)
			manifold::Mesh nw = c.manifold::Manifold::GetMesh();

		// cout<<k<<endl;
	}
	
	return 0;
}

If uses the different amount of memory depending on the number of iterations it does. Below is the valgrind report of iteration (k<3) case and (k<200) case. (k<3)

==2559273== HEAP SUMMARY:
==2559273==     in use at exit: 0 bytes in 0 blocks
==2559273==   total heap usage: 161,038 allocs, 161,038 frees, 40,932,731 bytes allocated
==2559273== 
==2559273== All heap blocks were freed -- no leaks are possible
==2559273== 
==2559273== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

(k<200)

==2542646== HEAP SUMMARY:
==2542646==     in use at exit: 0 bytes in 0 blocks
==2542646==   total heap usage: 9,003,669 allocs, 9,003,669 frees, 3,765,234,371 bytes allocated
==2542646== 
==2542646== All heap blocks were freed -- no leaks are possible
==2542646== 
==2542646== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

To my perspective, I think the memory usage should not increase depending on the number of total k iterations because we are using local variable. Is this not true or Is this behavior intended?? (Of course I think it shouldn't...) This library does not frees local objects when they are not referenced. I think this kind of behavior made my python program increase time, memory. (Also as we can see in the valgrin result, the memory keep increases as the k increases)

Any suspicious parts?? I'll start the debugging but it'll be helpful it any advice is given. Do you see anyway of fixing this behavior?

chpark1111 avatar Oct 18 '22 10:10 chpark1111

Yes, this behavior is weird, C++ standard says that local object will be destroyed at the end of its scope, so the manifold and mesh objects should be deallocated at the end of each loop. As there is no leak at the end of the program execution, we don't have any cycles in the data (for the std::shared_ptrs). I guess it might be the csg_tree of manifolds may somehow retain references to c and cause high memory usage. It would be nice if you can have a look at the csg_tree and check their sizes recursively after each loop iteration and see if there is anything retained after each loop. I cannot do so as I am a bit busy recently.

pca006132 avatar Oct 18 '22 14:10 pca006132

Your example looks like it is accumulating complexity with c = c + manifolds[i].

Try printing out the number of vertices and triangles after each accumulation.

pentacular avatar Oct 18 '22 14:10 pentacular

I checked that the mesh are consistent. (Have the same number of vertices in the all the iterations) And tried to copy the manifold with AsOriginal() when assigning to c but didn't work. Also, I checked that with valgrind that the memory increase is coming from the GetMesh() function. So, I think that's not the case.

chpark1111 avatar Oct 18 '22 14:10 chpark1111

I checked that the mesh are consistent. (Have the same number of vertices in the all the iterations) And tried to copy the manifold with AsOriginal() when assigning to c but didn't work. Also, I checked that with valgrind that the memory increase is coming from the GetMesh() function. So, I think that's not the case.

We perform lazy evaluation in csg_tree, it will not evaluate the mesh until you run GetMesh(), so memory only increases after running GetMesh() makes sense. I'm afraid you still need to dump the content of the csg_tree to check if there is anything that retained references to c or something.

pca006132 avatar Oct 19 '22 02:10 pca006132

Unless c and manifolds[i] are identical, then you should expect c = c + manifolds[i] to increase the complexity of c.

So if the number of vertices does not change for this case, you should be suspicious of your measurement.

pentacular avatar Oct 19 '22 03:10 pentacular

Unless c and manifolds[i] are identical, then you should expect c = c + manifolds[i] to increase the complexity of c.

So if the number of vertices does not change for this case, you should be suspicious of your measurement.

I think you misunderstood the point here. Of course c's complexity increase when doing c = c + manifolds[i]'s operation. But the point is that when we escape the outer for loop (k), c is not completely freed.

And I think I have to check the csg_tree too. I'll work on that!

chpark1111 avatar Oct 19 '22 04:10 chpark1111

You shouldn't expect the C++ heap size to be reduced by freeing memory -- generally you should just expect it to become available for reuse.

The valgrind output above supports this case -- it reports no leaks.

pentacular avatar Oct 19 '22 04:10 pentacular

I found the problem! First, the memory increase is coming from the large vector includesMeshID.

In the function IncrementMeshIDs(), includesMeshID array size keep increases meaning that it does not decreases or reuse the mesh numbers that are freed!! (The Mesh number keep increases making the problem allocate huge size of array includesMeshID). Note that includesMeshID's size is Manifold::Impl::meshIDCounter_(which is almost same as meshIDstart) * 2. And if we print the meshIDstart, it keep increases as the loop progresses. I think the current implementation of mesh number fails to track the freed meshes.

So, this was the problem of keep making my program use large memory. And since they are freed after use, we don't have a leak. (includesMeshID is freed right after use)

For my case, I don't track the mesh numbers so I can just not use IncrementMeshIDs(). But in a longer view I think this should be fixed to handle various cases (Like my case it makes the program unnecessary use lot of memory.).

I think I can fix this easily So, I'll start working on how I can fix this and make a pull request.

chpark1111 avatar Oct 19 '22 10:10 chpark1111

Nice finding. I think should be fixed by using a map instead of a vector to map the IDs, presumably most of the used IDs are not present in the current mesh.

@elalish what do you think?

pca006132 avatar Oct 19 '22 14:10 pca006132

Yes, that sounds reasonable to me. I'm a little surprised this was noticeable; wouldn't includesMeshIDs have a length equal roughly to the number of meshes? Still, agreed that a map would be better.

elalish avatar Oct 19 '22 14:10 elalish

Oh, I think map is the right chose. It's hard to track all the mesh numbers with vector design when meshes are randomly allocated and freed.

chpark1111 avatar Oct 19 '22 15:10 chpark1111

that mesh id counter is a global static one

pca006132 avatar Oct 19 '22 15:10 pca006132

@chpark1111 Any luck making a PR?

elalish avatar Nov 11 '22 15:11 elalish

I can work on this and also port make the hashtable more general.

pca006132 avatar Nov 11 '22 16:11 pca006132

Oh I got distracted by some other project while working on it. Since I didn't make any meaningful changes yet maybe @pca006132 can work on it!

chpark1111 avatar Nov 11 '22 16:11 chpark1111