email-reply-parser
email-reply-parser copied to clipboard
Regex making the script to get stuck and cpu goes to 100%
Found an issue with this regex
/^>*(.+\son.*at.*wrote:)$/m
That pattern is causing a catastrophic backtracking issue. The problem is with (.+\son.*at.wrote:) - the nested . quantifiers create exponential backtracking when they can't find a match and i there is a pretty large email
Checking pattern: /^>*(20[0-9]{2}\/.+のメッセージ:)$/m
Checking pattern: /^>*(.+\s<.+>\sschrieb:)$/m
Checking pattern: /^>*(.+\son.*at.*wrote:)$/m
And now my script its stuck for several minutes and on cpu 100%
The email is a pretty big reply chain. so probably on smaller emails is not an issue.
You probably don't have re2 installed, it was made optional in this commit but the readme still implies it will be installed by default.
npm i [email protected]
This is a minimal reproducible example for an input that causes hanging without re2:
import { describe, it, expect } from 'vitest';
import EmailReplyParser from 'email-reply-parser';
describe('email-reply-parser slow pattern cases', () => {
function onAtChunk(xLen = 200, yLen = 200) {
return ' on ' + 'x'.repeat(xLen) + ' at ' + 'y'.repeat(yLen);
}
it('near match missing colon can be very slow without re2', () => {
const line = '>' + onAtChunk().repeat(20000) + ' wrote' + '\n';
const input = line + 'Actual reply line above\n';
const parsed = new EmailReplyParser().read(input);
const text = parsed.getVisibleText();
expect(typeof text).toBe('string');
});
});
On my machine, with RegExp, it hung for several minutes before I killed it. With re2 it finishes in ~1.7s.
Why is this not fixed? This issue is open since May. We just had our prod containers go completely down since we added this package without peer-installing re2. That means this package is completely unusable without re2. At the very least, this should be very clearly documented. But in my opinion, re2 should be required, otherwise this package just acts as ReDOS software.
You have a fork button right here.
Waiting for your PR.
Thank you Baptiste, we will pass on that. Would prefer not to contribute to a package that can ReDOS our application (maybe not yours or someone else's though!). Appreciate the suggestion though
I really don't appreciate this passive-aggressive shit. If you are not satisfied with this library, you may consider building/maintaining your own.
We've been open-sourcing this package because we needed it.
The best way to avoid any redos is to use Re2, which will guarantee avoiding any problems.
RE2 was previously mandatory; however, a few users are using this package in environments that do not support native dependencies.
This package was made pre-AI. I think it won't be rocket science to update the affected regexes and test all regexes with a CI.