https-everywhere icon indicating copy to clipboard operation
https-everywhere copied to clipboard

Fix WebExtension Excessive Memory Usage

Open cschanaj opened this issue 8 years ago • 11 comments

Type: other

Re-open of #1775, based on https://github.com/EFForg/https-everywhere/issues/1775#issuecomment-151655897 ordered by priority (personal)

GOAL

???

PROPOSAL

Proof of concept PRs, may or may not be get implemented for various reasons. Discussions should be made in their respective PRs.

  • Try using a Prefix Tree for RuleSets.targets (#12530)
  • ~Try using a LRU Cache for caching (#12591)~ See #3952

TODO

Core

  • [x] Use Map instead of Object for RuleSets.targets (#12160) (2017.8.31)
    • improve performance
  • [x] Use JSON for default.rulesets instead of XML (#12273) (2017.9.12)
    • reduce the startup time to below 1.5 seconds
    • reduce the memory usage to below 100mb (on Chromium task manager on startup after GC).
    • reduce the size of default.rulesets from 8.4mb to 6.4mb and thus smaller extension size
  • [x] Use Map instead of Object for redirectCounter and Fix Memory Leak (#12422) (2017.9.12)
    • fix a small memory leak for long lasting browsing session
    • improve performance
  • [x] Use null instead of Empty Set for in RuleSets.ruleCache (#12494, #12577) (2017.10.24)
    • avoid unnecessary memory allocation
  • [x] Merge exclusions into a single pattern in default.rulesets (#16092, related #15448) (2018.7.8)
    • reduce memory overhead
  • [x] Trivialize rulesets with static securecookie, thanks @RReverser (#16029, #16378) (2018.9.19)
    • removed 300+ wildcard targets, improve lookup speed and memory usage
    • related #16540, #16634
  • [x] Merge securecookies which match any cookie.name in each rulesets (2019.5.2)
    • Refactor and enhance trivialize-cookie-rules.js (#17438, #17578)
    • Update utils/merge-rulesets.py (#17492)
    • Remove/ Update some securecookie with no coverage (#17565)
  • [x] Drop support for middle wildcard targets in potentiallyApplicableRulesets (2019.5.2)
    • Forbid and remove wildcards in the middle (#4369, #12319 thanks @Bisaloo)
    • Remove middle wildcard support in rules.js (#17715)
    • reduce runtime overhead (lookup all_rules) significantly
  • [x] Triviailize rulesets with right-wildcard targets and trivialize rewrites (#17855, #18068) (2019.6.27)
    • Removed around 1600+ wildcards target, improve lookup speed and memory usage
  • [x] Simplify shouldSecureCookie and safeToSecureCookie (#17742) (2019.6.27)
  • [x] Refactor code for the browser session storage (#18153) (2019.11.7)
  • [ ] Async addFromJson, parseOneJsonRuleset
    • avoid blocking code to improve performance
  • [ ] Abstraction for Rule and Exclusion
    • avoid redundant array of similar objects
    • better API (?)

UI

  • [x] Improve popup page performance and reduce memory usage (#15532) (2018.6.21)
    • Avoid overhead with document.createElement
    • Avoid unnecessary listeners with event delegations

PENDING FIX

  • [ ] Point RuleSets.targets.get(host) to a Ruleset instead of Array of Ruleset
    • REQUIRE forbid and remove duplicated targets (#11440)
    • improve heap performance and simplify lookup operations
  • [ ] Use a Radix Trie instead of Map for RuleSets.targets with reversed hostname
    • REQUIRE forbid and remove wildcards at the right (#11203)
    • efficient memory structure

WON'T FIX

  • [x] Compile RegExp only when it is neccessary
    • RegExp are already lazily compiled on Chromium and Firefox ~improve browser startup overhead~
  • [x] Update rules.js to limit cache size to 256 entries (#13328)
    • Caching has a small memory overhead as compared to the extension memory usage
  • [x] Avoid using chrome.runtime.sendMessage() whenever possible
    • Explained in https://github.com/EFForg/https-everywhere/issues/12232#issuecomment-348048311
    • ~REF gorhill/httpswitchboard/issues/345~
    • REF https://bugs.chromium.org/p/chromium/issues/detail?id=320723
    • REF https://github.com/gorhill/uBlock/blob/master/src/js/messaging.js## MISC ISSUES
  • [x] Use ES6 generator functions to reduce run-time overhead in search & lookup (~#16951~)
    • Show only rulesets which performed an upgrade (#17047)

Bugs or unexpected features found when reading through the code

  • [x] Credentials in URLs are not restored upon rewrites (#12377, #12380)
  • [x] Clear domainBlacklist and urlBlacklist when all Incognito sessions are destroyed (#12452)
  • [x] Inconsistent behavior for .onion and .onion. (with tailing dot) hosts (#13568, #13569)
  • [x] Disabled site (with/ without tailing dots) are treated differently (#18451, #18452)

cschanaj avatar Aug 30 '17 13:08 cschanaj

@cschanaj See https://github.com/koops76/https-everywhere/projects/1 for more ideas.

ghost avatar Aug 30 '17 21:08 ghost

Compile RegExp only when it is neccessary

Not sure about SpiderMonkey, but in V8 regexps are anyway compiled lazily only on the first use and not when regexp literal is met or new RegExp is called.

RReverser avatar Sep 01 '17 09:09 RReverser

Using the constructor function provides runtime compilation of the regular expression.

Firefox only mention that new RegExp() is compiled in the runtime. Not really sure if it is lazily compiled or not. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

@RReverser ~do you have a source confirming for V8's behavior?~ Confirmed this behavior on Chromium browser using the console and taking heap snapshot. It also seems to me that Firefox does lazy compilation as well. thanks!!

cschanaj avatar Sep 01 '17 11:09 cschanaj

RE: radix trie with reversed hostnames. I've been wanting to use something like this for privacy badger's storage datastructure. I have a pending PR with an implementation here https://github.com/EFForg/privacybadger/pull/1568/files#diff-930ddd92cefc0f8bc3a8722e10b3e05e

I'm not using Map, but I probably should be.

cc @Hainish

cowlicks avatar Sep 02 '17 00:09 cowlicks

Several of those points will require a major rewrite of google rulesets (duplicated multiple times, right wildcards, etc.). I don't really like the way targets and rules are split right now anyways. I was already thinking of a major overhaul of these rulesets. I will probably open a new issue about it soon.

Bisaloo avatar Sep 02 '17 07:09 Bisaloo

@cschanaj The link you gave for " Avoid using chrome.runtime.sendMessage() whenever possible " no longer works

ghost avatar Oct 06 '17 14:10 ghost

It seems that Gorhill has disabled the "Issues" in his repo. Here is some comments made by Gorhill that we might be interested in

18 Jun 2014

The status of the issue:

https://code.google.com/p/chromium/issues/detail?id=320723

is ambiguous. It's not marked as fixed "verified", on the other hand it's labeled M32. Quite unclear whether the memory leak was solved or not.

In any case, there are many places in HTTPSB where using a long-lived connection makes more sense than using a one time message, which comes with a good overhead if I read correctly.

Note: It is marked as fixed "verified" AND labeled M32 now.

18 Jun 2014

Well that is interesting.. Not using chrome.runtime.sendMessage() reduces markedly memory footprint of the loaded web page when loading the vim test page, and it does appear it is easier on the CPU -- so finding is significantly less overhead using Port.postMessage() than chrome.runtime.sendMessage().

18 Jun 2014

Yes, but it's not as simple as replacing one string with another. I need to write my own sendMessage()-like code, but instead of creating a port every time like the chrome API does, I reuse the one instance.

20 Jun 2014

Finally got rid of chrome.runtime.sendMessage() from everywhere, now using ports. For each message sent, chrome.runtime.sendMessage() creates a port, so this is an improvement. I can't measure for sure though. If there are still memory leaks with using sendMessage() as reported in the above-linked chromium issue, then they should be fixed as far as HTTPSB is concerned.

I did notice in some previous memory tests using heap snapshot diffs that there seemed to be a lot of instances of objects which I believe might have been related to message event objects.

In any case, even if sendMessage() wasn't leaking, using a longer-lived port rather than one-per-message is saner. Currently ports are created when an extension page or a content script is created, and the same port is used until the page or content script is destroyed. Much less overhead.

cschanaj avatar Oct 06 '17 23:10 cschanaj

The most common use of chrome.runtime.sendMessage() in the codebase currently is to communicate which rulesets are active in a tab to the popup, upon creation. Other usages include communicating options toggles, importing a json settings blob from the options UX, as well as adding and deleting new rules. These currently all must necessarily go through sendMessage since addon HTML pages have to communicate with the background context.

Profiling the extension memory usage after subsequent clicks on the popup after visiting boingboing.net (which has lots of active rulesets communicated), I do see an increase overall, but I'm unsure what to look for in a heap analysis to see how much is due to this particular leak. And the overall memory usage in Chrome vacillates for me between 75 and 86M, settling at 72M.

I wonder if we can safely ignore this problem until the upstream leak is patched. It seems to me that any minimization we do with sendMessage will come at a cost of usability or presentation to the user of ruleset data.

Hainish avatar Nov 30 '17 01:11 Hainish

Any updates on this? A quick check by toggling this extension shows that it's using between 40 and 80 MB of memory (uncertainty because it's not the only extension).

jhpratt avatar Feb 04 '19 21:02 jhpratt

@Giltyhub2 I actually saw that pull elsewhere and reinstalled — I imagine wasm will help out as well. I'll certainly report back if I notice any issues.

jhpratt avatar Jul 06 '19 18:07 jhpratt