pymks icon indicating copy to clipboard operation
pymks copied to clipboard

Worker exceeded 95% memory budget

Open beyucel opened this issue 4 years ago • 6 comments

I just wanted to discuss the memory usage issue with this notebook. When chunk size is above 25 ( >250 mb), a single worker gets to 6.3 GB memory usage and restarts the kernel. When Chunk size is 25 and below, there is no problem.

My question is, why do 300mb chunks have this high memory usage issue?

beyucel avatar Jun 02 '20 14:06 beyucel

We need to profile the memory usage. Let's check the delta first to see if that makes sense.

  • https://pypi.org/project/memprof/
  • https://pypi.org/project/memory-profiler/

wd15 avatar Jun 02 '20 14:06 wd15

I initially tried the memory_profiler and %memit magic function. Not sure if it does what we want , or I did not use it correct ( I am investigating that) because it shows that (notebook ) peak memory: 219.80 MiB, increment: 14.20 MiB and it does not seem reasonable. I am following the htop and memory usage for those lines is a lot higher. I will try other two memory profiler as well.

beyucel avatar Jun 03 '20 04:06 beyucel

I would do all the memory profiling outside of the notebook for starters as that can confuse things. Also, start with only one process to get a good benchmark to make sure you understand the delta between each step in the code. Furthermore, break the code down into imperative steps might help.

wd15 avatar Jun 03 '20 16:06 wd15

Thanks, Daniel. That is what I am trying to do right now. I will share the delta values of each process

beyucel avatar Jun 03 '20 16:06 beyucel

Filename: memory_try.py

Line Mem usage Increment Line Contents

39  183.129 MiB  183.129 MiB   @profile
40                             def HomogenizationPipeline(x):
41  183.215 MiB    0.086 MiB       a1=PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0).transform(x)
42  183.762 MiB    0.547 MiB       a2=TwoPointCorrelation(periodic_boundary=True, cutoff=15,correlations=[(1,1)]).transform(a1)
43  183.762 MiB    0.000 MiB       a3=FlattenTransformer().transform(a2)
44 10015.367 MiB 9831.605 MiB       a4=PCA(n_components=3).fit_transform(a3)
45 10015.367 MiB    0.000 MiB       return a4

================================================ This is the non-compute version this does not tell much because the first three lines are lazy and all computation is made with PCA fit transform. I will add the compute version as well for discussion. This still uses the same notebook as above ( I just wrote a separate .py file with the notebook and used the notebook as a shell)

beyucel avatar Jun 04 '20 06:06 beyucel

@beyucel is this still an issue? Can this be closed? Please close if you think that this isn't something we can act on

wd15 avatar Aug 03 '21 00:08 wd15