NewPipeExtractor
NewPipeExtractor copied to clipboard
High number of memory allocations when parsing timeago dates
The cause is this line: https://github.com/TeamNewPipe/NewPipeExtractor/blob/5a9b6ed2e3306b9152cc6689dd61dbbe43483845/extractor/src/main/java/org/schabi/newpipe/extractor/localization/TimeAgoParser.java#L76
It looks like compiling a pattern makes a lot of memory allocations, and we should avoid it if possible.
Screenshot from profiler:
Also pinging @AudricV since he recently discussed about refactoring the parser in #1068.
@FireMasterK the code is highly inefficient. The pattern is re-compiled every time one wants to check for a match (even more than 20 times per match!)
The easy fix would be to cache the Pattern and call multiple times the matches
method on it.
However, since the pattern in this case is very simple, regex could be an overkill. Take a look at my fork where I implemented the regex-free (yet locale-aware) parsing. Seems like we save 10x memory allocations.
The source code is not yet very clean, but it is sufficient for tests :)
Some minor notes
- I tried to not cheat and further optimize things just to make the comparison legit. Indeed, the code in the fork is right now semantically equivalent to the one in the
dev
branch - Note that I haven't modified
parseTimeAgoAmount
, which however relies on a regex behind the scenes. Right now,textualDate
is parsed multiple times, but one could just parse it once and get the necessary information.
@FireMasterK did you have time to take a look at it? Also, recent changes in #1082 should share similar memory allocation issues.