pymks
pymks copied to clipboard
Worker exceeded 95% memory budget
I just wanted to discuss the memory usage issue with this notebook. When chunk size is above 25 ( >250 mb), a single worker gets to 6.3 GB memory usage and restarts the kernel. When Chunk size is 25 and below, there is no problem.
My question is, why do 300mb chunks have this high memory usage issue?
We need to profile the memory usage. Let's check the delta first to see if that makes sense.
- https://pypi.org/project/memprof/
- https://pypi.org/project/memory-profiler/
I initially tried the memory_profiler and %memit magic function. Not sure if it does what we want , or I did not use it correct ( I am investigating that) because it shows that (notebook ) peak memory: 219.80 MiB, increment: 14.20 MiB
and it does not seem reasonable. I am following the htop and memory usage for those lines is a lot higher. I will try other two memory profiler as well.
I would do all the memory profiling outside of the notebook for starters as that can confuse things. Also, start with only one process to get a good benchmark to make sure you understand the delta between each step in the code. Furthermore, break the code down into imperative steps might help.
Thanks, Daniel. That is what I am trying to do right now. I will share the delta values of each process
Filename: memory_try.py
Line Mem usage Increment Line Contents
39 183.129 MiB 183.129 MiB @profile
40 def HomogenizationPipeline(x):
41 183.215 MiB 0.086 MiB a1=PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0).transform(x)
42 183.762 MiB 0.547 MiB a2=TwoPointCorrelation(periodic_boundary=True, cutoff=15,correlations=[(1,1)]).transform(a1)
43 183.762 MiB 0.000 MiB a3=FlattenTransformer().transform(a2)
44 10015.367 MiB 9831.605 MiB a4=PCA(n_components=3).fit_transform(a3)
45 10015.367 MiB 0.000 MiB return a4
================================================ This is the non-compute version this does not tell much because the first three lines are lazy and all computation is made with PCA fit transform. I will add the compute version as well for discussion. This still uses the same notebook as above ( I just wrote a separate .py file with the notebook and used the notebook as a shell)
@beyucel is this still an issue? Can this be closed? Please close if you think that this isn't something we can act on