parsedmarc icon indicating copy to clipboard operation
parsedmarc copied to clipboard

Running parsedmarc with pypy causes "Too many open files" exception when reading gziped reports from a mailbox

Open seanthegeek opened this issue 3 years ago • 4 comments

While running parsedmarc in a pypy3.9-7.3.9 virtualenv on Rocky Linux 8.4, a "Too many open files" exception occurs when attempting to parse gripped DMARC aggregate reports retrieved from a Microsoft 365 mailbox via Microsoft Graph. This does not occur in a standard CPython virtualenv.

Install procedure

sudo dnf install libxml2-devel libxslt-devel python3-devel

wget https://downloads.python.org/pypy/pypy3.9-v7.3.9-linux64.tar.bz2
tar -pxf pypy3.9-v.7.3.9-linux64.tar.bz2 
mv pypy3.9-v7.3.9-linux64 pypy3

# virtualenv needs to be installed this way because the version of virtualenv included in RHEL/CentOS/Rocky Linux repositories fails to create a pypy virtualenv 
./pypy3/bin/pip3 install -U pip setuptools wheel virtualenv
sudo chown -R root:root pypy3
sudo mv pypy3 /opt
sudo ln -s /opt/pypy3/bin/pypy3 /usr/local/bin/pypy3

sudo -u parsedmarc /usr/local/bin/pypy3 -m venv --upgrade /opt/parsedmarc/venv
sudo -u parsedmarc /usr/local/bin/pip install -U parsedmarc

Log output

WARNING:init.py:1121:Message with subject "Report-ID: REDACTED" is not a valid aggregate DMARC report: Unexpected error: [Errno 24] Too many open files

seanthegeek avatar May 14 '22 16:05 seanthegeek

@nathanthorpe Can you look into this a bit? If it's not a bug in pypy itself, I'd like to get this fixed in the same release as your #320 PR.

seanthegeek avatar May 14 '22 17:05 seanthegeek

I'm not able to reproduce this with the same instance of pypy running on Ubuntu 20.04, but I could just have a smaller gzip file that doesn't trigger that.

nathanthorpe avatar May 14 '22 17:05 nathanthorpe

This just got weirder. I was able to reproduce this bug again, but this time I tried moving the exact same emails from the invalid folder back to the inbox without doing anything else, and all 200+ emails processed correctly.

seanthegeek avatar May 18 '22 05:05 seanthegeek

Actually, I have my numbers wrong. Still happening. This is why I shouldn't do debugging in the middle of the night. 😅 Oh well, at least I have some samples that consistently fail now.

seanthegeek avatar May 18 '22 06:05 seanthegeek