plotnine
plotnine copied to clipboard
Memory leak using ggsave
I was recently using plotnine to create a series of plots showing the development of a plot through a training sequence. I did notice quite quickly that my computer would freeze up because all the memory was used up. Some digging ended up putting the blame on ggsave. The following code reproduces the issue most of the time:
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
from plotnine import *
plot_data = pd.DataFrame(np.random.uniform(size=[200,2]), columns=['a', 'b'])
res = (
ggplot(plot_data) + geom_point(aes(x='a', y='b')) + theme(figure_size=[20,20])
)
while True:
ggsave(res, filename='test.png')
For me this causes my RAM be filled up steadily.
I'm running Anaconda (Python 3.8.5) under WSL, the most recent development version of plotnine, and run the code from Jupyter Lab. Additional information can be found in my pip freeze file I attached to the post.
I think it is just a version of this matplotlib issue.
I can only reproduce the issue when running the code in Jupyter Lab. Setting interactive mode for matplotlib to off:
import matplotlib.pyplot as plt
plt.ioff()
also eliminates the memory issue as far as I can see. The solution I used is to simply create a separate Python script to generates all the png files of the plots. This simply does not show this issue, no need to run plt.ioff.
I used this code to test this:
import matplotlib.pyplot as plt
plt.ioff() # <<<---- comment this line to get the memory issue in Jupyter Lab
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
from plotnine import *
import psutil
import os
# Functions based on https://www.geeksforgeeks.org/how-to-get-current-cpu-and-ram-usage-in-python/
def get_mem_usage():
return psutil.virtual_memory()[3]
def get_cpu_usage():
load1, load5, load15 = psutil.getloadavg()
return (load15/os.cpu_count()) * 100
def test_mem(dot_each = 10, total = 100, verbose=False):
plot_data = pd.DataFrame(np.random.uniform(size=[200,2]), columns=['a', 'b'])
res = (
ggplot(plot_data) + geom_point(aes(x='a', y='b')) + theme(figure_size=[20,20])
)
it = 0
mem_usage = []
cpu_usage = []
while True:
ggsave(res, filename='test.png')
it += 1
if it % dot_each == 0:
mem_usage.append(get_mem_usage())
cpu_usage.append(get_cpu_usage())
if it == total:
break
return (max(mem_usage) - min(mem_usage)) / 1e9
#mem_usage, cpu_usage, gb_increase = test_mem()
for i in range(5):
print('Memory increase', test_mem(), 'GB')
Just answering this old issue to make life easier for people who are still experiencing this (and come from a Google search) when running batches of plotnine ggsave().
You need to import matplotlib and change its used backend:
import matplotlib as mpl
import plotnine as gg
# this is important to avoid memory leak when doing batches of plotnine ggplot (that use matplotlib)
# see https://github.com/has2k1/plotnine/issues/498
# see https://github.com/matplotlib/matplotlib/issues/8519#issuecomment-793047367
mpl.use("Agg")
Closing this as I cannot reproduce it anymore.