NewPipeExtractor icon indicating copy to clipboard operation
NewPipeExtractor copied to clipboard

High number of memory allocations when parsing timeago dates

Open FireMasterK opened this issue 1 year ago • 2 comments

The cause is this line: https://github.com/TeamNewPipe/NewPipeExtractor/blob/5a9b6ed2e3306b9152cc6689dd61dbbe43483845/extractor/src/main/java/org/schabi/newpipe/extractor/localization/TimeAgoParser.java#L76

It looks like compiling a pattern makes a lot of memory allocations, and we should avoid it if possible.

Screenshot from profiler: image

FireMasterK avatar Mar 06 '23 03:03 FireMasterK

Also pinging @AudricV since he recently discussed about refactoring the parser in #1068.

@FireMasterK the code is highly inefficient. The pattern is re-compiled every time one wants to check for a match (even more than 20 times per match!) The easy fix would be to cache the Pattern and call multiple times the matches method on it. However, since the pattern in this case is very simple, regex could be an overkill. Take a look at my fork where I implemented the regex-free (yet locale-aware) parsing. Seems like we save 10x memory allocations. The source code is not yet very clean, but it is sufficient for tests :)

Some minor notes

  • I tried to not cheat and further optimize things just to make the comparison legit. Indeed, the code in the fork is right now semantically equivalent to the one in the dev branch
  • Note that I haven't modified parseTimeAgoAmount, which however relies on a regex behind the scenes. Right now, textualDate is parsed multiple times, but one could just parse it once and get the necessary information.

lrusso96 avatar Jul 02 '23 13:07 lrusso96

@FireMasterK did you have time to take a look at it? Also, recent changes in #1082 should share similar memory allocation issues.

lrusso96 avatar Oct 14 '23 06:10 lrusso96