holoviews icon indicating copy to clipboard operation
holoviews copied to clipboard

Memory leak when producing Sankey plots with Bokeh

Open GillesFa opened this issue 2 years ago • 4 comments

Thanks for contacting us! Please read and follow these instructions carefully, then delete this introductory text to keep your issue easy to read. Note that the issue tracker is NOT the place for usage questions and technical assistance; post those at Discourse instead. Issues without the required information below may be closed immediately.

ALL software version info

OS Windows/Ubuntu Python 3.7 bokeh 2.4.3 holoviews 1.14.9

Description of expected behavior and the observed behavior

We have got a standalone script that producing Sankey plots regularly. The Sankey plot are produced using Bokeh. Our scripts has is memory increasing continuously. We need to restart the script every 3 to 4 days before exhausting the memory of the machine.

According to my analysis, the memory leak is due to bokeh objects not being released. Bokeh objects are released if they are not used by Holoviews.

Complete, minimal, self-contained example code that reproduces the issue

import sys
import holoviews as hv
from collections import defaultdict
from bokeh.plotting import Figure
from bokeh.io import output_file
from bokeh.io import save
from pathlib import Path


DEFAULT_PLOT_WIDTH = 1200  # pixels
DEFAULT_PLOT_HEIGHT = 700  # pixels
TITLE_FONT_SIZE = "16pt"
LABEL_FONT_SIZE = "10pt"


def plot_port_targets(
        port_targets: dict,
        plot_width: int = DEFAULT_PLOT_WIDTH,
        plot_height: int = DEFAULT_PLOT_HEIGHT) -> Figure:
    hv.extension("bokeh")
    train_counts = defaultdict(int)
    for mine_name, port_names in port_targets.items():
        for port_name in port_names:
            train_counts[(mine_name, port_name)] += 1

    sankey = hv.Sankey(train_counts)
    sankey.opts(width=plot_width, height=plot_height)
    return hv.render(sankey)


def generate_html_file():
    output_file_path = Path('holoviews_memory_leak.html')
    planned_port_targets = {
        "Mine 1": [
            "Port 1",
            "Port 1",
            "Port 2",
            "Port 1",
            "Port 1",
            "Port 2",
            "Port 2",
            "Port 1"
        ],
        "Mine 2": [
            "Port 2",
            "Port 2",
            "Port 2"
        ],
        "Mine 3": [
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2"
        ],
        "Mine 4": [
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 1",
            "Port 2",
            "Port 1",
            "Port 2",
            "Port 2"
        ],
        "Mine 5": [
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 1",
            "Port 2",
            "Port 2"
        ],
        "Mine 6": [
            "Port 2",
            "Port 2"
        ],
        "Mine 7": [
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2"
        ],
        "Mine 8": [
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2",
            "Port 2"
        ]
    }

    planned_port_targets_plot = plot_port_targets(
        port_targets=planned_port_targets
    )
    output_file(output_file_path, title="Train Targets Visualisations", mode="inline")
    save(planned_port_targets_plot)


def main(argv):
    input('Press ENTER to start')
    for i in range(100):
        print('.', end='')
        generate_html_file()
    print('')
    input('Press ENTER to continue')
    for i in range(1000):
        if i % 100 or i == 0:
            print('.', end='')
        else:
            print('')
            print('.', end='')
        generate_html_file()
    print('')
    input('Press ENTER to continue')
    generate_html_file()
    input('Press ENTER to end')


if __name__ == '__main__':
    main(sys.argv)

Stack traceback and/or browser JavaScript console output

Screenshots or screencasts of the bug in action

Uploading holoview memory leak.png…

GillesFa avatar Jun 21 '22 05:06 GillesFa

I can recreate the problem and I think the culprit is related to hv.render. If I modify the original code to the following and run it with memray run file.py I get the following:

Code
from collections import defaultdict

import holoviews as hv
from tqdm import tqdm


hv.extension("bokeh")


def main():
    port_targets = {
        1: [1, 1, 2, 1, 1, 2, 2, 1],
        2: [2, 2, 2],
        3: [2, 2, 2, 2, 2],
        4: [2, 2, 2, 2, 2, 1, 2, 1, 2, 2],
        5: [2, 2, 2, 1, 2, 2],
        6: [2, 2],
        7: [2, 2, 2, 2, 2],
        8: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
    }

    train_counts = defaultdict(int)
    for mine_name, port_names in port_targets.items():
        for port_name in port_names:
            train_counts[(f"Mine {mine_name}", f"Port {port_name}")] += 1


    sankey = hv.Sankey(train_counts)
    hv.render(sankey)


if __name__ == "__main__":
    for _ in tqdm(range(1000)):
        main()

With hv.render(sankey):

newplot

Without hv.render(sankey):

newplot(1)

I then tried a simple Curve plot and I could see the same problem:

Code
import holoviews as hv
from tqdm import tqdm


hv.extension("bokeh")


def main():
    curve = hv.Curve((range(10_000), range(10_000)))
    hv.render(curve)


if __name__ == "__main__":
    for _ in tqdm(range(1000)):
        main()

With hv.render(curve): newplot(3)

Without hv.render(curve): newplot(4)

hoxbro avatar Jun 21 '22 09:06 hoxbro

@philippjfr, this was automatically closed. Can you open it again?

hoxbro avatar Jun 29 '22 12:06 hoxbro

Any news regarding this issue?

GillesFa avatar Jul 26 '22 08:07 GillesFa

@philippjfr Could a solution to this be to add a weakref.ref for this function?

https://github.com/holoviz/holoviews/blob/a780aebb3375fa6b3eae8607372382db885e913a/holoviews/plotting/bokeh/renderer.py#L64-L75

I tried the following:

        plot = super().get_plot(obj, doc, renderer, **kwargs)
        if plot.document is None:
            document = Document() if self_or_cls.notebook_context else curdoc()
        else:
            # Not sure about this part
            document = plot.document
        if self_or_cls.theme:
            document.theme = self_or_cls.theme
        plot.document = weakref.ref(document)
        return plot

And got the following results

Sankey Curve
Without weakref sankey_without_weakref curve_without_weakref
With weakref sankey_with_weakref curve_with_weakref

Edit: The above implementation does not work as it raises errors when running the tests.

hoxbro avatar Aug 23 '22 10:08 hoxbro