manifold icon indicating copy to clipboard operation
manifold copied to clipboard

memory leak when TBB and PSTL is enabled

Open rob-rb opened this issue 10 months ago • 27 comments

Great library! I love it.

The following code shows how memory is consumed. you need to pip install psutil first.

`from manifold3d import * import random

def get_boolean_difference(n): sph = Manifold.sphere(1.5, n) tet = Manifold.tetrahedron() m = sph - tet mesh = m.to_mesh() return mesh.vert_properties, mesh.tri_verts

def memory(): import psutil process = psutil.Process() return int(process.memory_info().rss*1e-6)

def test(): for j in range(100): points, faces = get_boolean_difference(2500-j) m1 = memory() points = None faces = None print(memory(),m1)

if name == "main": test()`

rob-rb avatar Apr 11 '24 09:04 rob-rb

Maybe you can try to force trigger python gc and see if the memory consumption is the same. It is hard to tell if we are leaking memory or python interpreter is holding the references.

pca006132 avatar Apr 11 '24 11:04 pca006132

I introduced some gc.collect's but no changes I provide as well the output of this script below:

from manifold3d import * import random import gc

def get_boolean_difference(n): sph = Manifold.sphere(1.5, n) tet = Manifold.tetrahedron() m = sph - tet mesh = m.to_mesh() return mesh.vert_properties, mesh.tri_verts

def memory(): import psutil process = psutil.Process() return int(process.memory_info().rss*1e-6)

def test(): for j in range(100): points, faces = get_boolean_difference(2500-j) m1 = memory() points = None faces = None m2 = memory() gc.collect() gc.collect() print(memory(),m2, m1)

if name == "main": test()

52 52 120 53 53 122 54 54 122 54 54 122 63 63 131 59 59 128 59 59 128 62 62 131 66 66 134 68 68 136 70 70 138 69 69 137 75 75 143 75 75 143 73 73 141 76 76 144 84 84 151 85 85 152 86 86 154 81 81 148 91 91 158 92 92 159 91 91 158 91 91 158 98 98 166 99 99 166 98 98 165 102 102 169 106 106 173 106 106 173 107 107 174 113 113 180 112 112 179 117 117 184 115 115 182 114 114 181 121 121 188 122 122 188 122 122 189 120 120 187 131 131 197 130 130 196 129 129 195 129 129 195 136 136 202 138 138 204 135 135 201 137 137 203 144 144 210 146 146 212 145 145 211 142 142 208 150 150 216 151 151 217 150 150 216 152 152 217 159 159 224 159 159 225 160 160 225 160 160 226 166 166 232 168 168 234 169 169 235 164 164 230 177 177 242 173 173 238 171 171 236 172 172 237 180 180 244 179 179 244 184 184 249 179 179 243 189 189 254 189 189 253 187 187 252 188 188 253 191 191 255 196 196 260 191 191 255 191 191 255 200 200 264 203 203 267 197 197 261 199 199 264 208 208 272 205 205 269 206 206 270 206 206 270 216 216 280 213 213 277 213 213 277 213 213 277 222 222 286 221 221 285 221 221 285 219 219 283 229 229 292 229 229 293 228 228 291 231 231 294

rob-rb avatar Apr 11 '24 20:04 rob-rb

Interesting, will definitely look into this, probably related to nanobind as well. Btw, please use markdown code block syntax or the indentation will be lost, and I have to guess your indentation :)

pca006132 avatar Apr 12 '24 08:04 pca006132

It seems that the issue is TBB is leaking memory, not a python binding issue. Disabling parallelism fixes the issue. Disabling the use of concurrent_map (https://github.com/oneapi-src/oneTBB/issues/1133) does not fix the issue.

pca006132 avatar Apr 25 '24 14:04 pca006132

Try this patch:

diff --git a/src/utilities/include/par.h b/src/utilities/include/par.h
index 2049a501..788b6713 100644
--- a/src/utilities/include/par.h
+++ b/src/utilities/include/par.h
@@ -85,7 +85,7 @@ inline constexpr ExecutionPolicy autoPolicy(size_t size) {
   }
 
 #if MANIFOLD_PAR != 'T' || \
-    (TBB_INTERFACE_VERSION >= 10000 && __has_include(<pstl/glue_execution_defs.h>))
+    (TBB_INTERFACE_VERSION >= 10000 && 0)
 #if MANIFOLD_PAR == 'T'
 #define STL_DYNAMIC_BACKEND(NAME, RET)                        \
   template <typename Ret = RET, typename... Args>             \

On my system, it seems that the problem is due to PSTL. Switching to newer compiler version does not fix the issue. I guess the only way to fix it is to avoid using PSTL for now, maybe leave it as an option that is default to OFF?

pca006132 avatar Apr 25 '24 15:04 pca006132

It seems that with libc++ it is fine. @elalish can you test it on macos? I think macos uses libc++ by default.

pca006132 avatar Apr 25 '24 15:04 pca006132

Caught one memory leak in our C binding tests as well:

diff --git a/test/manifoldc_test.cpp b/test/manifoldc_test.cpp
index d6c3697b..366feb3e 100644
--- a/test/manifoldc_test.cpp
+++ b/test/manifoldc_test.cpp
@@ -101,6 +101,7 @@ TEST(CBIND, extrude) {
   manifold_delete_manifold(diff);
   manifold_delete_simple_polygon(sq[0]);
   manifold_delete_polygons(polys);
+  manifold_delete_cross_section(cross);
 }
 
 TEST(CBIND, compose_decompose) {

pca006132 avatar Apr 25 '24 15:04 pca006132

@elalish can you test it on macos?

Sure, but what do you want me to test? Python? Or did you find a way to check it in C++?

elalish avatar Apr 25 '24 16:04 elalish

Python should be enough. Note that you cannot compile with asan when using python.

pca006132 avatar Apr 25 '24 16:04 pca006132

It's just going up and down for me, looks reasonable for a GC'ed language:

509 547
646 685
711 853
739 881
758 895
765 809
802 1001
820 1020
834 878
779 912
818 952
838 972
831 875
879 1012
904 1037
934 1067
923 966
953 1086
968 1101
911 1044
940 984
967 1099
1003 1135
1015 1147
1018 1061
1069 1201
990 1122
977 1109
961 1004
954 1086
984 1115
984 1115
983 1026
1008 1139
1018 1149
841 972
876 919
909 1040
952 1082
962 1093
969 1011
975 1105
994 1124
1003 1133
1025 1067
1037 1167
1047 1177
1070 1200
1087 1130
1063 1193
953 1082
988 1117
955 997
976 1105
995 1124
968 1097
946 988
963 1092
950 1079
964 1092
991 1032
1003 1132
1004 1132
980 1108
925 966
933 1060
956 1083
872 1000
922 964
959 1087
941 1068
925 1052
950 991
969 1096
973 1100
993 1120
1005 1046
997 1123
898 1024
856 982
869 910
869 995
911 1037
927 1053
947 988
956 1082
973 1099
977 1103
989 1030
1000 1125
983 1108
983 1108
989 1029
1004 1129
1016 1141
1033 1158
1040 1080
1053 1177
1066 1190
1066 1190

elalish avatar Apr 25 '24 17:04 elalish

yeah I think libstdc++ pstl is the culprit here...

pca006132 avatar Apr 25 '24 17:04 pca006132

Strong words - have you filed a bug yet? Any kind of minimal repro?

elalish avatar Apr 25 '24 17:04 elalish

No, but can try to get a minima repro later. I think functions like parallel uninitialized_fill may already cause some issue.

pca006132 avatar Apr 25 '24 17:04 pca006132

Ah, found the reason:

Parallel STL is not supported by oneTBB. if you want to use PSTL, you can revert to an older TBB version.

You can use oneAPI DPC++ Library (oneDPL) instead of Parallel STL.

Please refer to the below links for more information.

https://www.intel.com/content/www/us/en/docs/onedpl/developer-guide/2022-1/execution-policies.html

https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html

And tested with older version, it works correctly. This is annoying...

pca006132 avatar Apr 25 '24 18:04 pca006132

Wow, I'm so confused! So we have Thrust, OpenMP, TBB, OneDPL, PSTL, oneAPI, CUDA...

I wonder if anyone has a made a diagram showing how they all call each other? I'm beginning to wonder if that graph will have cycles...

elalish avatar Apr 25 '24 19:04 elalish

I think we should probably ask oneTBB people with this... feels weird that they stop supporting PSTL, considering there may be other libraries that uses PSTL and want to link tbb. This makes oneTBB not the true replacement of the old tbb, but they have the same shared library name...

pca006132 avatar Apr 26 '24 02:04 pca006132

Agreed. Still, it has the sound of everyone wanting to be the "one, true, overarching parallelism API". I wonder if they realized oneTBB wouldn't quite get there, so maybe they built oneAPI on top? I wonder if PSTL slots in there somehow?

elalish avatar Apr 26 '24 04:04 elalish

Update: GCC 13 works fine. I think the leak is probably due to PSTL version issue.

pca006132 avatar May 01 '24 18:05 pca006132

So, do we think there's anything actionable here for us, or should we just close this?

elalish avatar May 01 '24 21:05 elalish

probably make a cmake option for the user to decide, and by default we will disable it unless gcc 13 or newer is present, or on mac?

not sure if there is a good way to check the pstl version.

pca006132 avatar May 02 '24 05:05 pca006132

In the context of removing our Thrust dependency entirely, what do we think is a good path forward? I'm pretty comfortable requiring latest compiler versions - I think we're pretty close to the bleeding edge to get all the parallelism support we need. The main thing is that there is at least one compiler that works for each architecture.

elalish avatar May 10 '24 20:05 elalish

According to the Godot Engine documentation, it is limited to c++17. How new of a c++ standard does Manifold want to go? Manifold could trade off features for compatibility.

https://docs.godotengine.org/en/stable/contributing/development/cpp_usage_guidelines.html

fire avatar May 10 '24 22:05 fire

Well, PSTL was introduced in c++17, so hopefully that's adequate. I more mean we might need to require GCC 13 rather than 12, given parallelism bugs we've found.

elalish avatar May 10 '24 23:05 elalish

@elalish The main issue though is how to get distros maintainers to be happy about this. IIRC distros provide their own libtbb.so so we don't get to choose the version of tbb, and distros use the same compiler for all package builds, so we don't get to choose the compiler version. And I think gcc 12 will stick for at least several years for some distros...

pca006132 avatar May 11 '24 11:05 pca006132

My current thought for fixing this issue is to fall back to the Thrust implementations by default, unless the user explicitly enable PSTL, or we are certain that the PSTL version and TBB versions are compatible. E.g. we know that GCC 13 or libc++ (sufficiently recent) is fine with onetbb, and the older tbb (2020 or older) is fine with older versions of PSTL.

But this makes getting rid of Thrust even harder...

pca006132 avatar May 11 '24 11:05 pca006132

Yeah, I think getting rid of thrust is pretty important. I might just say fall back to single-threaded. Remind me how bad these bugs are? They're outside of our code base, so I feel like we should just have some documentation saying "we verify these compilers - if you have problems on others, try disabling parallelism".

elalish avatar May 12 '24 02:05 elalish

It probably depends on the application. If things are long-running, that can present a problem.

Another problem is that our python packages are using onetbb and old GCC. Maybe I can look at using an older version of tbb to avoid this issue...

pca006132 avatar May 12 '24 02:05 pca006132