I just wanted to discuss the memory usage issue with this notebook. When chunk size is above 25 ( >250 mb), a single worker gets to 6.3 GB memory usage and restarts the kernel. When Chunk size is 25 and below, there is no problem.

My question is, why do 300mb chunks have this high memory usage issue?

Jun 02 '20 14:06 beyucel

We need to profile the memory usage. Let's check the delta first to see if that makes sense.

https://pypi.org/project/memprof/
https://pypi.org/project/memory-profiler/

Jun 02 '20 14:06 wd15

I initially tried the memory_profiler and %memit magic function. Not sure if it does what we want , or I did not use it correct ( I am investigating that) because it shows that (notebook ) peak memory: 219.80 MiB, increment: 14.20 MiB and it does not seem reasonable. I am following the htop and memory usage for those lines is a lot higher. I will try other two memory profiler as well.

Jun 03 '20 04:06 beyucel

I would do all the memory profiling outside of the notebook for starters as that can confuse things. Also, start with only one process to get a good benchmark to make sure you understand the delta between each step in the code. Furthermore, break the code down into imperative steps might help.

Jun 03 '20 16:06 wd15

Thanks, Daniel. That is what I am trying to do right now. I will share the delta values of each process

Jun 03 '20 16:06 beyucel

Filename: memory_try.py

Line Mem usage Increment Line Contents

39  183.129 MiB  183.129 MiB   @profile
40                             def HomogenizationPipeline(x):
41  183.215 MiB    0.086 MiB       a1=PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0).transform(x)
42  183.762 MiB    0.547 MiB       a2=TwoPointCorrelation(periodic_boundary=True, cutoff=15,correlations=[(1,1)]).transform(a1)
43  183.762 MiB    0.000 MiB       a3=FlattenTransformer().transform(a2)
44 10015.367 MiB 9831.605 MiB       a4=PCA(n_components=3).fit_transform(a3)
45 10015.367 MiB    0.000 MiB       return a4

================================================ This is the non-compute version this does not tell much because the first three lines are lazy and all computation is made with PCA fit transform. I will add the compute version as well for discussion. This still uses the same notebook as above ( I just wrote a separate .py file with the notebook and used the notebook as a shell)

Jun 04 '20 06:06 beyucel

@beyucel is this still an issue? Can this be closed? Please close if you think that this isn't something we can act on

Aug 03 '21 00:08 wd15

pymks
pymks copied to clipboard

Worker exceeded 95% memory budget

Line Mem usage Increment Line Contents

pymks pymks copied to clipboard

Worker exceeded 95% memory budget

Line Mem usage Increment Line Contents

pymks
pymks copied to clipboard