patchworklib
patchworklib copied to clipboard
Plotting gets slower as more plots are created
First, thanks for making this clearly much-needed package. It's kind of crazy this problem has not been solved in the MPL ecosystem before.
I have noticed that, with each Brick created and plotted, patchwork gets slower. Note, the problem is not that larger collections of Bricks are slower, it's that with each Brick created, every Brick plotted is slower:
import patchworklib as pw
import matplotlib.pyplot as plt
import timeit
def test_simple_plot():
ax = pw.Brick(figsize=(2,2))
ax.plot([1, 2, 3, 4], [1, 4, 2, 3]);
return ax.savefig()
print(timeit.timeit(test_simple_plot, number=10))
# 5.051288887858391
print(timeit.timeit(test_simple_plot, number=10))
# 13.497105043381453
print(timeit.timeit(test_simple_plot, number=10))
# 21.88392266072333
print(timeit.timeit(test_simple_plot, number=10))
# 30.55025789886713
This is just a toy example, the problem gets way worse for bigger figures.
I did some line profiling and discovered the slow part is ax.savefig. I learned that internally, patchworklib is adding axes for each plot to a single giant shared matplotlib.figure.Figure, which grows axes with each Brick added:
print(repr(pw.Brick._figure))
# <Figure size 72x72 with 40 Axes>
As I understand, to savefig on a given Brick, you deep-copy that figure using dill, then delete the axes you don't need. As this über-figure grows, the time spent dumping and loading in Brick.savefig increases.
As a hack, I have been running this function periodically; nuking that big figure speeds things back up. There is still a lot of overhead in the pickling/unpickling, but at least it doesn't get worse and worse.
def cleanup_patchwork():
import patchworklib as pw
plt.close(pw.Brick._figure)
pw.Brick._figure = plt.figure(figsize=(1,1))
cleanup_patchwork()
print(timeit.timeit(test_simple_plot, number=10))
# 5.08872677013278
I'm hoping we can figure out a better solution for this; unfortunately, this would seem to require reworking the internals of the library. Any ideas?
I should mention, this cleanup_patchwork() workaround obviously kills all previously-created Bricks since their backing Figure gets garbage collected. So not a great solution overall.
@caseygrun Thank you for your pointing, and I'm sorry that my dirty code wastes your valuable time...
Your understanding is precise. Patchworklib project all axis objects you created on a single figure.
So, increasing the number of axes objects, the savefig function needs more time. It is inevitable.
However, I re-implemented the savefig function without using dill by taking your point. The current savefig function temporarily removes the axes objects not needed from the current figure. Then, the removed axes objects would be back to the figure if needed. This implementation looks to be working well and running faster than the previous implementation.
However, I don't understand the detail of Axes.remove and 'Figure.add_axes' functions, so If I find some bugs in the current implementation, I will return it to the previous implementation.
In any case, Patchworklib is still under development and the number of users is probably still small. So, these pointing outs are very much appreciated. Thank you for your continued support.
On the contrary, your library has saved me lots of time by providing a solution to a problem I would have otherwise had to solve some other way. So thanks for that! And thanks for the quick patch, I will try it out. Just a few more thoughts after pondering this last night...
Patchworklib project all axis objects you created on a single figure. So, increasing the number of axes objects, the savefig function needs more time. It is inevitable.
I wonder if it is inevitable though. It's certainly the case that the more Bricks you combine into a single Figure, the more time it will take. The current architecture allows any Brick to be combined with any other Brick, and to accomplish that, all Bricks get plotted on axes on the same global singleton Figure. But maybe this global state (including _basefigure, _axes_dict, and now _removed_axes) could be encapsulated in a class (call it Quilt, for example). Each Quilt would contain one master Figure, and Bricks created from that Quilt would all have their axes added to the same Figure. There could be a default singleton Quilt, which would replicate the current behavior of the library, so most users would not need to create a Quilt directly. But if a user knew they were only going to combine three of four plots into a figure, they could create a Quilt for that purpose and avoid the overhead of an increasingly large Figure in the global Quilt.
But I'll have to test out your new version first, since if I understand correctly, you basically remove all the axes except the ones you need, then periodically add axes back when it's time to savefig. That might be a good enough and sustainable solution by itself.