php-imap icon indicating copy to clipboard operation
php-imap copied to clipboard

Memory for fetched message attachments is never released which results in memory leak

Open robkleinee opened this issue 1 year ago • 7 comments

A long-running script which fetches and processes messages with attachments from a mailbox with this IMAP package results eventually in a 'memory size of ... bytes exhausted'.

I think this has something to do with the fetched attachments. It seems that these attachments somehow are never released from memory?

A workaround is to set manually the attachments to an empty collection after the attachments are processed.

Code to duplicate the bug (be sure to have emails with attachments in the mailbox)

Package version: 5.5.0 PHP: 8.1.x

$inboxFolder = $client->getFolder('inbox');

while (true) {
    echo 'Memory usage: ' . memory_get_usage() . PHP_EOL;

    $messages = $inboxFolder
        ->messages()
        ->all()
        ->limit(1)
        ->get();

    // Process the messages
}

Workaround (Extended on 2025-05-08)

$inboxFolder = $client->getFolder('inbox');

while (true) {
    echo 'Memory usage: ' . memory_get_usage() . PHP_EOL;

    $messages = $inboxFolder
        ->messages()
        ->all()
        ->limit(1)   // The workaround only works when only one message is retrieved and processed at at time
        ->get();

    $messages->each(
        function ($message) {
            // TODO: Process the message

            // Optionally move the message to another folder/mailbox
            $movedMessage = $message->move('name_of_mailbox');

            // Workaround to fix memory issue, for both message and the moved message
            $movedMessage->setAttachments(AttachmentCollection::make([]));
            $message->setAttachments(AttachmentCollection::make([]));
        }
    );
}

Am I doing something wrong? Or is there a memory leak?

Thanks in advance!

PS: It seems like this package is abandoned?

robkleinee avatar Nov 17 '24 16:11 robkleinee

I have the same issue! Package version: 4.1.2 PHP: 8.1.x

veroca88 avatar Dec 13 '24 16:12 veroca88

@robkleinee @veroca88 I also have the problem that Webklex uses a lot of memory. I have created an IMAP account for a test. It contains 7 e-mails, 3 of which have attachments with a total size of 20 MB (1x 10MB, 2x 5MB). This is enough to exceed a memory limit of 128 MB in PHP.

$manager = new ClientManager([]);
$transport = $manager->make([
    'host' => $settings->getIncomingHost(),
    'port' => $settings->getIncomingPort(),
    'encryption' => $encryption,
    'validate_cert' => true,
    'username' => $settings->getIncomingUser(),
    'password' => $settings->getIncomingPassword(),
    'protocol' => $settings->getInboxType(),
    'timeout' => self::TIMEOUT,
]);
@$transport->connect();

$folder = $transport->getFolderByPath('INBOX');
$messages = $folder->query()->whereAll()->get();
dd(memory_get_usage(), memory_get_peak_usage());

This returns 134MB for memory_get_usage() and 145MB for memory_get_peak_usage(). This is definitely too much. This is only a small mailbox. I have now spent many hours trying to find the cause. I found the suggestion on https://github.com/Webklex/php-imap/issues/531, but it only brings a minimal improvement.

Memory consumption only seems to be a problem with (larger) attachments, because the data is stored multiple times in memory (for example as property $raw and $content). I have found two approaches to massively reduce memory consumption.

At the end of the constructor of the Attachment class, I discard the local property $message and $part:

class Attachment {
    /**
     * Attachment constructor.
     * @param Message $message
     * @param Part $part
     * @throws DecoderNotFoundException
     */
    public function __construct(Message $message, Part $part) {
        [...]
        unset($this->message);
        unset($this->part);
    }
}

I know that this is not "clean" and can cause further problems. At least in my case it doesn't cause any problems. With this adjustment I get only 68MB for memory_get_usage() and 130MB for memory_get_peak_usage().

Furthermore, I do not retrieve messages all at once, but iterate over the messages with the following code:

$query = $folder->query();
$list = $folder->overview($sequence = "1:*");
foreach (array_keys($list) as $uid) {
    $message = $query->getMessageByUid($uid);
}

This results in 13MB for memory_get_usage() and 117MB for memory_get_peak_usage(). However, this only works in combination with the adaptation of the attachment class. Without adapting the class, nothing changes in terms of memory consumption. So, although each message is retrieved individually, there are probably so many cross-references between message and attachment that everything is still kept in memory.

I have no idea how best to simplify all this, but the library definitely needs to be customized and can no longer hold such large amounts of data in memory. And it's hard to apply the patches on my own (it requires extending a lot of classes).

wurst-hans avatar May 07 '25 14:05 wurst-hans

Hi @wurst-hans thanks for investigating and sharing what you've found. If you would like to tackle the core issue, please feel free to do so, even if it involves changes to several classes. I believe the benefits would be rather substantial - perhaps something for a major version jump to v7?

Webklex avatar May 07 '25 16:05 Webklex

I've been trying to understand the code better for a few hours now and I'm making progress, but slowly. I don't have much time at the moment so I'm focusing on optimizing my problems. I am very grateful for your library and of course try to give something back to the community. However, my customizations are very experimental and I still don't understand PHP IMAP well enough.

The recommended customization of https://github.com/Webklex/php-imap/issues/531 does more than I thought. However, it makes a decisive difference whether you retrieve all messages at once with

$messages = $folder->query()->whereAll()->get();

or whether you only fetch the headers and retrieve each message individually with

$list = $folder->overview($sequence = "1:*");
foreach (array_keys($list) as $uid) {
    $message = $query->getMessageByUid($uid);
    if ($message->hasAttachments()) {
        // do something
    }
    $message->setAttachments(AttachmentCollection::make());
}

This reduces memory consumption considerably. You only have to empty the attachments once after each access to an e-mail so that the memory consumption remains low.

My suggestion is to check whether all properties/methods are really being used. I could imagine that you don't really need a $message/getMessage() on the attachment. With regard to the large file attachments, the Part/Structure class is also a big problem because it stores $raw and $content, among other things. This class is also referenced in the attachment as $part. This means that a file attachment of an e-mail is stored at least three times in the memory and probably even a fourth time in the Message class as $structure.

Of course it's nice to have as much raw data as possible on the objects, but not in the case of large attachments (or hundreds of emails with small attachments, which in total also leads to problems). If I come up with something better, I'll get back to you.

wurst-hans avatar May 07 '25 18:05 wurst-hans

@wurst-hans Thanks for your investigation.

I think there are two seperate problems concerning memory usage:

  1. A high amount of memory is used when processing mails with attachments.
  2. Some objects regarding the attachments are never destroyed, which eventually leads to a crash due to the memory leak.

I opened this issue because of problem number 2. I think that's a bigger problem than problem number 1.

The workaround I suggested in my openings post is working for me. It's running for months now in a production environment, without crashes.

I think the problem has something to do with a circular dependency between some attachment objects, so the garbage collector can't destroy them when the message is destroyed. But at the moment I don't have the time to dive into it.

Some more information about the workaround:

  • The workaround only works when only one message is retrieved and processed at a time (that's why the limit(1) is there).
  • When optionally moving the message to another folder, the workaround should also be applied on the moved message.

So in total (I also editted my openings post):

$inboxFolder = $client->getFolder('inbox');

while (true) {
    echo 'Memory usage: ' . memory_get_usage() . PHP_EOL;

    $messages = $inboxFolder
        ->messages()
        ->all()
        ->limit(1)   // The workaround only works when only one message is retrieved and processed at at time
        ->get();

    $messages->each(
        function ($message) {
            // TODO: Process the message

            // Optionally move the message to another folder/mailbox
            $movedMessage = $message->move('name_of_mailbox');

            // Workaround to fix memory issue, for both message and the moved message
            $movedMessage->setAttachments(AttachmentCollection::make([]));
            $message->setAttachments(AttachmentCollection::make([]));
        }
    );
}

robkleinee avatar May 08 '25 08:05 robkleinee

Thanks for clarification. I still prefer using overview() now, because I'm not going to process each message. And even with option setFetchBody(false) the basic information retrieval via ->overview() is much faster than ->messages()->all().

I agree, that this might have to do with some circular references not being cleaned up. Nevertheless, the message data is kept at least 4 times in memory (sometimes as raw data, sometimes as already decoded data) for every message. But, yes, problem number two might have higher criticality. And I still haven't found the reason for that, really crazy.

wurst-hans avatar May 08 '25 08:05 wurst-hans

I think the most simple (lame?) solution could be to make a destructor on 'Message' class and clear the attachments of the message. That should fix the memory leak I think.

robkleinee avatar May 08 '25 09:05 robkleinee