holoviews
holoviews copied to clipboard
Memory leak when producing Sankey plots with Bokeh
Thanks for contacting us! Please read and follow these instructions carefully, then delete this introductory text to keep your issue easy to read. Note that the issue tracker is NOT the place for usage questions and technical assistance; post those at Discourse instead. Issues without the required information below may be closed immediately.
ALL software version info
OS Windows/Ubuntu Python 3.7 bokeh 2.4.3 holoviews 1.14.9
Description of expected behavior and the observed behavior
We have got a standalone script that producing Sankey plots regularly. The Sankey plot are produced using Bokeh. Our scripts has is memory increasing continuously. We need to restart the script every 3 to 4 days before exhausting the memory of the machine.
According to my analysis, the memory leak is due to bokeh objects not being released. Bokeh objects are released if they are not used by Holoviews.
Complete, minimal, self-contained example code that reproduces the issue
import sys
import holoviews as hv
from collections import defaultdict
from bokeh.plotting import Figure
from bokeh.io import output_file
from bokeh.io import save
from pathlib import Path
DEFAULT_PLOT_WIDTH = 1200 # pixels
DEFAULT_PLOT_HEIGHT = 700 # pixels
TITLE_FONT_SIZE = "16pt"
LABEL_FONT_SIZE = "10pt"
def plot_port_targets(
port_targets: dict,
plot_width: int = DEFAULT_PLOT_WIDTH,
plot_height: int = DEFAULT_PLOT_HEIGHT) -> Figure:
hv.extension("bokeh")
train_counts = defaultdict(int)
for mine_name, port_names in port_targets.items():
for port_name in port_names:
train_counts[(mine_name, port_name)] += 1
sankey = hv.Sankey(train_counts)
sankey.opts(width=plot_width, height=plot_height)
return hv.render(sankey)
def generate_html_file():
output_file_path = Path('holoviews_memory_leak.html')
planned_port_targets = {
"Mine 1": [
"Port 1",
"Port 1",
"Port 2",
"Port 1",
"Port 1",
"Port 2",
"Port 2",
"Port 1"
],
"Mine 2": [
"Port 2",
"Port 2",
"Port 2"
],
"Mine 3": [
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2"
],
"Mine 4": [
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 1",
"Port 2",
"Port 1",
"Port 2",
"Port 2"
],
"Mine 5": [
"Port 2",
"Port 2",
"Port 2",
"Port 1",
"Port 2",
"Port 2"
],
"Mine 6": [
"Port 2",
"Port 2"
],
"Mine 7": [
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2"
],
"Mine 8": [
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2",
"Port 2"
]
}
planned_port_targets_plot = plot_port_targets(
port_targets=planned_port_targets
)
output_file(output_file_path, title="Train Targets Visualisations", mode="inline")
save(planned_port_targets_plot)
def main(argv):
input('Press ENTER to start')
for i in range(100):
print('.', end='')
generate_html_file()
print('')
input('Press ENTER to continue')
for i in range(1000):
if i % 100 or i == 0:
print('.', end='')
else:
print('')
print('.', end='')
generate_html_file()
print('')
input('Press ENTER to continue')
generate_html_file()
input('Press ENTER to end')
if __name__ == '__main__':
main(sys.argv)
Stack traceback and/or browser JavaScript console output
Screenshots or screencasts of the bug in action
I can recreate the problem and I think the culprit is related to hv.render
. If I modify the original code to the following and run it with memray run file.py
I get the following:
Code
from collections import defaultdict
import holoviews as hv
from tqdm import tqdm
hv.extension("bokeh")
def main():
port_targets = {
1: [1, 1, 2, 1, 1, 2, 2, 1],
2: [2, 2, 2],
3: [2, 2, 2, 2, 2],
4: [2, 2, 2, 2, 2, 1, 2, 1, 2, 2],
5: [2, 2, 2, 1, 2, 2],
6: [2, 2],
7: [2, 2, 2, 2, 2],
8: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
}
train_counts = defaultdict(int)
for mine_name, port_names in port_targets.items():
for port_name in port_names:
train_counts[(f"Mine {mine_name}", f"Port {port_name}")] += 1
sankey = hv.Sankey(train_counts)
hv.render(sankey)
if __name__ == "__main__":
for _ in tqdm(range(1000)):
main()
With hv.render(sankey)
:
Without hv.render(sankey)
:
I then tried a simple Curve plot and I could see the same problem:
Code
import holoviews as hv
from tqdm import tqdm
hv.extension("bokeh")
def main():
curve = hv.Curve((range(10_000), range(10_000)))
hv.render(curve)
if __name__ == "__main__":
for _ in tqdm(range(1000)):
main()
With hv.render(curve)
:
Without hv.render(curve)
:
@philippjfr, this was automatically closed. Can you open it again?
Any news regarding this issue?
@philippjfr Could a solution to this be to add a weakref.ref
for this function?
https://github.com/holoviz/holoviews/blob/a780aebb3375fa6b3eae8607372382db885e913a/holoviews/plotting/bokeh/renderer.py#L64-L75
I tried the following:
plot = super().get_plot(obj, doc, renderer, **kwargs)
if plot.document is None:
document = Document() if self_or_cls.notebook_context else curdoc()
else:
# Not sure about this part
document = plot.document
if self_or_cls.theme:
document.theme = self_or_cls.theme
plot.document = weakref.ref(document)
return plot
And got the following results
Sankey | Curve | |
---|---|---|
Without weakref |
||
With weakref |
Edit: The above implementation does not work as it raises errors when running the tests.