import-export-tools-ng
import-export-tools-ng copied to clipboard
PDF export gets slower and slower with many messages
I am exporting PDF from a folder with thousands of messages. The longer it proceeds, the slower it gets. Looking at the directory listing, the size of the directory I'm writing to seems to be growing to the point where the underlying filesystem is probably experiencing issues.
ls -lart pdf-export | nl | tail -n 3
2807 -rw-r--r- 1 tripleee tripleee 29958 Nov 1 13:25 200711120-Re_whoa-2116.pdf
2808 -rw-r--r- 1 tripleee tripleee 30877 Nov 1 13:25 200711120-Re_I_suppose-2115.pdf
2809 drw-r--r- 2 tripleee tripleee 229386 Nov 1 13:25 .
This is in a fresh Linux Mint 18.1 install running in Virtualbox. When I started the export it wrote on the order of 1200 messages per hour but now it's down to about 100 per hour.
A semi-obvious fix would be to divide the exported messages into subfolders, say one folder per 500 (though the pain point probably depends crucially on the type of the underlying file system).
As a quick workaround, I think I managed to speed it up by moving the messages to a different folder by and by during export. While exporting to the folder temp
, I ran the following in a Bash terminal during export.
mkdir -p finaldestination # In case you forgot to create it before running this
while true; do
date
mv temp/*.pdf finaldestination/
ls -lart finaldestination | nl | tail -n 3
sleep 1200
done
This moves the exported files from temp
to finaldestination
every 20 minutes. (If the move operation takes a long time, the delay will be that much longer, but it should keep things reasonably smooth up to several thousand messages.)
The loop also prints a brief listing of the latest few entries in finaldestination
in each iteration. The nl
adds a line number and the tail
trims the listing to the last few lines. (One of them will be the directory itself.)
If you want to change this to a one-liner you can easily recall from your Bash history, just replace all newlines with semicolons, except after do
(just remove the newline after that).
When you are done, ctrl-C will break out of the loop.
I have not performed a controlled experiment but so far this seems to improve performance.
Some of my earlier runs suggested a memory leak in either Thunderbird itself or in this extension. In a virtual machine with 4Gb memory, Thunderbird was consuming well over 3Gb just to display its main window at the end of an export. I expected to see the same symptoms after the last export I ran last night (some 2000 messages in a folder) but if they occurred, it seems to have recovered now.
Turns out that giving the virtual machine more memory seems to have fixed things nicely. Now with 8GB allocated for Linux Mint, I see Thunderbird race up to over 4GB memory almost instantly, but then export things much more quickly and without issues so far.
Before I increased the memory, I also had Thunderbird crash entirely during one export.
@tripleee Nothing like more memory for a virtual machine! I use them a lot , but performance contrary tremendously both in hardware and operation types. I imagine the issue here is more than one thing.
- VM resources
- VM hardware optimization
- Thrashing disk operations
- Possible asynchronous/synchronous Thunderbird or extension operations
- PDF image generation overhead
- bottlenecks marshaling across JavaScript-C++ boundary
I do not yet have direct experience with very large import or export operations since I took over the extension. I have communicated with several people who have done large imports successfully although they were different formats.
When I update PrintingTools next, I will probably learn a lot more about the PDF engine for which I really don't nothing about currently since it's a bit of a black box.
Dividing the folder I wanted to export into smaller pieces by way of virtual folders (saved searches) allowed me to proceed with fewer crashes and more certainty. I needed to divide the messages by author anyway, though dividing them by date instead, or as well, should also be a useful workaround. Batches of about 2,000 messages seemed to be doable with my setup (see above for details) but not much more than that.
For 30 hours, I saved 26,000 files from 10,000 messages with a total size of 10 GB, mostly in PDF format. 100 Mbps internet (I don’t know how much of these messages were cached before starting the export), 16 GB of RAM (half occupied), Intel Core i7-3770K, Win 10 x64. Is it long or not?
That amounts to 333.33 messages per hour on average, which isn't exactly stellar. I had the messages in a local mbox file so it's not directly comparable.
@tripleee it is definitely slow part of that PDF Engine changed In TB perhaps that is causing problems sounds like a memory leak I will have to investigate