FiltaQuilla icon indicating copy to clipboard operation
FiltaQuilla copied to clipboard

Body RegExp Match -Filter doesn't work

Open V-H opened this issue 1 year ago • 9 comments

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

V-H avatar Jul 19 '24 13:07 V-H

I am not sure if I have bandwidth this weekend - jsut preparing for a holiday and I am live streaming on Sunday, leaving on Monday, so I may need a reminder when after I return on the 31st!

RealRaven2000 avatar Jul 19 '24 23:07 RealRaven2000

Since you are still on Tb 115, can you check if rolling back to FiltaQuilla 4.0 resolves the issue?

RealRaven2000 avatar Jul 19 '24 23:07 RealRaven2000

Hi, I had the same issue. Rollback to FiltaQuilla 4.0 works for now.

shoneg avatar Jul 20 '24 06:07 shoneg

Also rollback to 4.0 — works for now.

V-H avatar Jul 20 '24 10:07 V-H

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

tested with the wrong filter first. the json name mislead me to test a different one. so the name is Fritz: Datenrate (63 kbit Empfangen, 23 kbit Senden) testing with that next now.

Note tof self: I should suggest a better default file name when saving a single filter

RealRaven2000 avatar Jul 20 '24 11:07 RealRaven2000

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

I think I have to remove the "To" condition? it's redacted ion your example mail?

RealRaven2000 avatar Jul 20 '24 11:07 RealRaven2000

I cannot reproduce the errors even though matching doesn't work (the extracted body part which is encoded quoted-printable seems to be truncated - probably needs some processing before being parsed by the regex.) If you are using debug mode can you also activate the debug switch extensions.filtaquilla.debug.mimeBody, like this:

  • Open the last tab in filtaQuilla settings (about FG/help)
  • activate debug mode
  • right click debug mode

image

find and toggle the setting:

image

it should give us additional info during body parsing

RealRaven2000 avatar Jul 20 '24 11:07 RealRaven2000

Part of the message was cut off - so I added code to download it completely; I also added some coded for processing the "quoted printable" format that the provider was using in you example email:

filtaquilla-4.2pre3.zip

The main problem is that it is not easily possible to access the plain text portion of the raw data (which is all I can access during filtering). Thunderbird itself uses C++ methods to do the parsing but these are not accessible to an Add-on.

RealRaven2000 avatar Jul 20 '24 14:07 RealRaven2000

v4.2pre3 still has same issue, just moved to a new line. 😁
Personally I'm happy to wait until whenever you can get to it.
Thanks for trying!

image

TonyGravagno avatar Jul 24 '24 22:07 TonyGravagno

just released 4.2 - Published 15/10/2024. if this one still persists, we will hopefully come up with a fix in 4.2.1

RealRaven2000 avatar Oct 15 '24 12:10 RealRaven2000

Updated to 4.2 under TB 115.16.0esr, but the problem still persist, same behaviour: RegEx-Filter doesn’t work and on running the filters TB locked up for about one Minute. Going back to 4.0.

V-H avatar Oct 19 '24 09:10 V-H

TB 115.12.0, FiltaQuilla 4.1

Processing this Filter: e-Mail (fuke-VH)_2024-07-19_15-14.json brings this Error: FiltaQuilla 10:23:7.273 [1359286 ms] filtaquilla-util.js:222:13 Uncaught InternalError: allocation size overflow bodyMimeMatch chrome://filtaquilla/content/filtaquilla-util.js:596 match chrome://filtaquilla/content/filtaquilla.js:1688 runSelectedFilters chrome://messenger/content/FilterListDialog.js:758 oncommand chrome://messenger/content/FilterListDialog.xhtml:1 filtaquilla-util.js:596:43 in Mails like RegEx-Filter_BeispielMailX.eml.txt

I think this is caused by this: https://github.com/RealRaven2000/FiltaQuilla/blob/a3554926c0f23908f2e17f67fec8135e4209b0ec/content/filtaquilla-util.js#L624

Galantha avatar Nov 04 '24 04:11 Galantha

I think that code snippet comes more or less directly from this MDN article, been a while since writing that:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

the function was designed to count the matched instances, I guess exec starts searching behind the last occurence. I think this only works if the global flag is set. So we need to check for the global flag first:

      if (reg.global) {
        while ((results= reg.exec(msgBody)) !== null) {
          txtResults += `Match[${count}]: ${results[0]}\n`;
          count++;
        }
      }

will have to test the code first and then commit a patch.

RealRaven2000 avatar Nov 04 '24 12:11 RealRaven2000

In the HTML version of the function, it was triggering overflow crashing.

Galantha avatar Nov 04 '24 14:11 Galantha

For Thunderbird 128, I added my own Mime Parsing routine, still work in progress. You can examine the retrieved parts in JS error console with the debug switch extensions.filtaquilla.debug.mimeBody = true

At the moment I am filtering out parts with contentType "attachment" and all "image/*" and "text/vcard". I am also considering removing "multipart/related", I don't think there is anything of value for the regex search there.

filtaquilla-4.2.1pre33.zip


To try out this version, download the zip file and then drag it into Thunderbird Add-ons Manager (without extracting)

RealRaven2000 avatar Nov 05 '24 00:11 RealRaven2000

Improved boundary detection between the parts, and also splittin for OpenPGP/MIME messages:

filtaquilla-4.2.1pre39.zip


To try out this version, download the zip file and then drag it into Thunderbird Add-ons Manager (without extracting)

RealRaven2000 avatar Nov 05 '24 01:11 RealRaven2000

Finally got a hold of John on the Add-on developer meeting, I scrapped my own mime parser and crafted a mime emitter that works with the built in mime parser in Tb128. Hopefully also backwards compatible with 115 (must test it). This version completely omits attachment parts and parses both text/plain and text/html parts, without any special processing. More features to come, see #313. This one should at least address the performance / hanging problems for now:

filtaquilla-4.2.1pre44.zip

  • Thunderbird's mime parse automatically converts all parts into unicode
  • I am currently not sure whether new lines are removed, so you might have to append the /m (multiline) switch for testing
  • to see the first matched string in error console use debug mode with extensions.filtaquilla.debug.regexBody=true - this will incur a slight performance hit as it matches the complete mail part where regex finds the first match. (I imagine that Regex.test() stops at the first find, whereas Regex.match() finds all matches and hence reads the complete part)

To try out this version, download the zip file and then drag it into Thunderbird Add-ons Manager (without extracting)

RealRaven2000 avatar Nov 07 '24 23:11 RealRaven2000

I have tested 4.2.1pre44 and it is now running as usual.

However, the filter does not work. I went back to 4.0 and it works without any problems.

At the moment I have no free time for debugging...

V-H avatar Nov 09 '24 12:11 V-H

I have tested 4.2.1pre44 and it is now running as usual.

However, the filter does not work. I went back to 4.0 and it works without any problems.

At the moment I have no free time for debugging...

that's ok, if you can then just post a regular expression unless you are paranoid about sharing that. (there are people who think that regular expressions are some sort of a secret recipe that could be harvested by spammers - I don't like talking with those as they tend to waste my time) If your regex contains privacy related stuff fair enough.

I can easily debug a filter and an email for you but you would have to forward me the "eml" file. it's really impossible for me to generate a ton of test data here.

RealRaven2000 avatar Nov 09 '24 19:11 RealRaven2000

Try the Latest version:

filtaquilla-4.3pre66.zip

You can set up more refined filters by clicking the settings button:

regex-panel


To test this prerelease, please download the zip file and then drag it into Thunderbird Add-ons Manager (without extracting)

RealRaven2000 avatar Nov 17 '24 19:11 RealRaven2000

Fixed in 4.3 - Published 17/11/2024

RealRaven2000 avatar Nov 18 '24 09:11 RealRaven2000