mailparser icon indicating copy to clipboard operation
mailparser copied to clipboard

parsing can be extremely slow on macOS

Open zackschuster opened this issue 2 years ago • 2 comments

the message parsing section of the emailjs test suite runs very slowly, even on an M1 Pro; a single test can take 2+ seconds to complete depending on the size of the payload (up to 5.1mb for a text file). this frequently causes test failures in ci, as parsing can take 10+ seconds to complete in github's environment.

according to profiling of the message parsing tests, the likely cause is this call (note the percentages):

 [C++]:
   ticks  total  nonlib   name
   1654   45.9%   47.2%  T _posix_spawnattr_setflags

obviously this is a very esoteric reaction, and i have no idea how or why such a low-level function is hammering / getting hammered by us; the code has been very stable for years & this slowness is not evident on ubuntu or windows. but, likewise, i've observed this slowness for years as well. i'm hopeful maybe there's just some weird one-line performance cliff? but i would have no idea how to start looking for such a thing.

zackschuster avatar May 07 '22 00:05 zackschuster

Can you isolate this by creating a test case for mailparser’s own test suite where parsing an email would take so long? From your example it’s hard to understand what’s going on exactly.

andris9 avatar May 07 '22 01:05 andris9

profiling test/mail-parser-test.js produces this:

 [C++]:
   ticks  total  nonlib   name
  27835   92.1%   92.5%  T _posix_spawnattr_setflags
   1459    4.8%    4.8%  t __ZN2v88internalL53Builtin_Impl_RelativeTimeFormatPrototypeFormatToPartsENS0_16BuiltinArgumentsEPNS0_7IsolateE
     92    0.3%    0.3%  T __ZN4node10contextify17ContextifyContext15CompileFunctionERKN2v820FunctionCallbackInfoINS2_5ValueEEE

about half of those values can be traced to Out of memory error:

 [C++]:
   ticks  total  nonlib   name
  13519   90.8%   91.3%  T _posix_spawnattr_setflags
    669    4.5%    4.5%  t __ZN2v88internalL53Builtin_Impl_RelativeTimeFormatPrototypeFormatToPartsENS0_16BuiltinArgumentsEPNS0_7IsolateE
    136    0.9%    0.9%  t __ZN4node2fsL4ReadERKN2v820FunctionCallbackInfoINS1_5ValueEEE

now, we're using simpleParser with html parsing disabled, so it's not quite an even comparison, but the behavior is the same so i'm assuming it's a valid clue. also, i have to ask, is a 1mb file size really the soft cap?


parenthetically, i'm hopeful that https://github.com/nodejs/node/issues/32226 might be the source of this headache. it makes as much sense to me as any other explanation, at least...

zackschuster avatar Aug 03 '22 17:08 zackschuster