user.js
user.js copied to clipboard
Socket lifetime and its effect on privacy
Continuing my research of browsers as explained in https://github.com/pyllyukko/user.js/issues/365 I found something interesting. The browsers which I tested in the last 2 days (for some it was a re-test with newer versions) were Midori, Epiphany, qutebrowser, Chromium, Firefox, Dooble.
The test procedure is fairly simple:
In 2 different consoles I run:
rcnetwork restart;tcpdump -i eth1 ip src host pc and dst host not router and dst host not pc -tq
watch 'netstat -anpt'
then I start the browser in which I have set beforehand homepage about:blank
and tightened everything possible (disable JS, cookies, plugins etc). Then I visit an URL of a simple text file, e.g. http://fsf.org/robots.txt and I look at packets and connections.
Results:
In my test all browsers show some weird behavior. Although the simple text file takes less then a second to download the browser continues to "chatter" with the remote host for several minutes. Also netstat
shows that there are active connections. I see that as a privacy issue because it literally means the user is telling the remote host "I am still online, here are some more TCP packets".
I received an explanation from Dooble's developer that this is due to the underlying web engine:
https://github.com/textbrowser/dooble/issues/23
Regardless of my hope that testing browsers with different web engines may give different result that doesn't seem to be the case. All of them keep sending TCP packets. The one and only browser which does not do that is lynx
- it simply downloads the document and instantly closes the socket.
Using my user.js
(a modified version of pyllyukko's one with some added settings which ensure zero packets sent to Mozilla etc) I tested Firefox 59.0.3 too. What I noticed as a difference from non-Firefox browsers is that FF quite actively sends the after-packets. Here is what happens:
Open http://fsf.org/robots.txt:
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 325
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 517
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 342
IP pc.37792 > www.fsf.org.https: tcp 373
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 517
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 342
IP pc.37794 > www.fsf.org.https: tcp 389
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 357
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 517
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 326
IP pc.33248 > svnweb.fsf.org.https: tcp 37
IP pc.33248 > svnweb.fsf.org.https: tcp 357
IP pc.33248 > svnweb.fsf.org.https: tcp 0
Page loaded. Waiting (touch nothing)... tcpdump shows:
IP pc.33248 > svnweb.fsf.org.https: tcp 37
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.33248 > svnweb.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 53
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.43500 > www.fsf.org.http: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37792 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 53
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
IP pc.37794 > www.fsf.org.https: tcp 0
All of the above are sent in groups of 3 lines every 8-10 seconds. netstat
shows:
tcp 0 0 pc:37794 www.fsf.org:https ESTABLISHED 10346/firefox
tcp 0 0 pc:37792 www.fsf.org:https ESTABLISHED 10346/firefox
tcp 0 0 pc:43500 www.fsf.org:www-http ESTABLISHED 10346/firefox
After about 2-3 minutes all this chattering stops.
tcp 0 0 pc:43500 www.fsf.org:www-http TIME_WAIT -
Another minute and this socked disappears too.
In summary: Several minutes of TCP chatter for a 6-line text file which loads in a few milliseconds. In different browsers this time and the number of the additional packets varies, as well as the time until all sockets "die". In Firefox the number of additional packets is particularly high although upon browser exit it closes them somewhat faster than others. Still it is far from as good as lynx.
So I was wondering: is there a way to control this through about:config
settings? Or are all modern engine-based browsers already doomed?
The one and only browser which does not do that is lynx - it simply downloads the document and instantly closes the socket.
Have you ever tried this test with dillo? It is quite a strict browser, albeit somewhat spartan. I have just checked its dependencies on Debian against lynx:
aramazan@torik:~$ LANG=C apt-cache depends lynx
lynx
Depends: libbsd0
Depends: libbz2-1.0
Depends: libc6
Depends: libgnutls30
Depends: libidn11
Depends: libncursesw5
Depends: libtinfo5
Depends: zlib1g
Depends: lynx-common
Conflicts: <lynx-ssl>
Breaks: <lynx-cur>
Breaks: <lynx-cur-wrapper>
Recommends: mime-support
Replaces: <lynx-cur>
Replaces: <lynx-cur-wrapper>
aramazan@torik:~$ LANG=C apt-cache depends dillo
dillo
Depends: wget
Depends: libc6
Depends: libfltk1.3
Depends: libgcc1
Depends: libjpeg62-turbo
Depends: libpng16-16
Depends: libssl1.1
Depends: libstdc++6
Depends: libx11-6
Depends: zlib1g
Recommends: perl
Recommends: <perl:any>
perl
They neither use web engines. The correlation between web engine usage and background chatter is noteworthy. (Assuming dillo behaves the way lynx does.)
The changelog you've linked to seems to be not updated for a long while. However dillo is still hosted by Debian Sid, which suggests no problems. Had it been abandoned, Debian would phase it out, like Midori.
Also I am beginning to wonder if these multiple connections are due to some feature or performence reasons. E.g. having multiple open connections handy for parallel loading, in case there be a need for multiple downloads from the same page visited. (I am no browser expert, so please take it with a grain of salt.)
Maybe pipelining Feature?
The changelog you've linked to seems to be not updated for a long while.
OK, I will try look deeper.
some feature or performence reasons
I really don't know and I am not an expert either but approaching it logically:
- lynx's performance is excellent.
- There is no need for 3 sockets to download a text file of few bytes only
- Maintaining open sockets does not reduce TTFB (e.g. if the page contains link to another page). Only the so called HTTP prefetching does (or rather works in the background for) that but that is turned off.
- It does not affect caching (e.g. if the server replies with HTTP 304).
I can't think of any other performance aspects. In fact - cloaking the net with unnecessary packets may have negative effect (and probably drain device battery faster).
Maybe pipelining Feature?
Can you explain?
There is no need for 3 sockets to download a text file of few bytes only
But the browser cannot know this beforehand, and may not be that intelligent to infer how many parallel connections will likely be needed for a given page from its extension (.txt). If a page contains multiple frames, images, etc. from the same host, then they can be loaded in parallel. I am just speculating.
Also, the browser may be oblivious to the number of connections. It may well be delegating all the plumbery work to the web engine. And the web engine being a bit too diligent, may open multiple connections.
As I am not into browser design whatsoever, I can't assess how the work is shared between bowser and web engine. There is a possibility that browser just occupies itself with the user side, delegating all the network job to the web engine. In that case, it is the web engine development tyeam that needs to be addressed. Or maybe it is all mixed - i.e. both the browser and the web engine may be doing their share of chatternig. E.g. advert sites may be accessed by browser, whereas others by the engine. I don't know how to tell which is responsible fro which.
I am not so sure. It is not the extension but the HTTP header which determines what the browser should load:
Content-Length: 185
Content-Type: text/html
Additionally this particular page also sends Connection: keep-alive
which is generally a way to reduce the needed TCP connections, not to increase them:
https://en.wikipedia.org/wiki/HTTP_persistent_connection#Advantages
As for the page content: the browser can surely see what the page contains as references and exercise program logic to open connections only when necessary (e.g. to load an image). In this particular case I assume a second connection may be needed for a favicon and the third one may be just the redirect from HTTP to HTTPS (speculation). But I don't see why connections should be kept open for much longer after the resource have been downloaded. During my tests with other pages I have noticed connections being made and packets being sent on browser closing to tracking domains, to fbcdn.net etc.
Also if we assume that the browser and the web engine it uses work each one for itself - that sounds to me like a serious design problem. If the engine sends packets on its own without being asked to - practically it can do whatever it wants. I really don't know for sure. Perhaps someone with more expertise could explain.
Reading further... it seems all this may be related to HTTP connection persistence, i.e. continuing communication to keep connection alive in order to prevent opening next connections. Perhaps this is beneficial for the server. I should probably test this:
http://kb.mozillazine.org/Network.http.keep-alive.timeout
I think this may be it:
user_pref("network.http.keep-alive.timeout", 0);
Now FF behaves like lynx :)
Perhaps a good value should be 10-15.
Code contains images hosted on static.fsf.org subdomain, then there are iframes, these are all needed connections unless you harden your browser (textmode, local css).
The main culprit of the behaviour you're looking at is Plone CMS, if you look at bottom icons, they are loaded from this .css: static.fsf.org/nosvn/plone4/css/fsf-2017-11-13.css
Sockets aren't needed on static pages at all.
Code contains images hosted on static.fsf.org subdomain, then there are iframes, these are all needed connections unless you harden your browser (textmode, local css).
No, robots.txt does not contain that. That's why I am testing with it explicitly.
BTW how do you harden your browser to use local css? And which browser allows such hardening?
In Firefox it's View > Page Style > No Style
Sorry, I forgot you just go to robots.txt
You can use Fiddler proxy by Telerik to look into this.
In Firefox it's View > Page Style > No Style
I didn't know that. Thanks. So far I used to block it using uMatrix.
You can use Fiddler proxy by Telerik to look into this.
I will check that too (also new to me). Thanks.
Interesting.
Perhaps a good value should be 10-15.
Would require some reference and/or research regarding what would be the optimal value for this setting.
HTTP persistent connection allows:
-
A reduced latency in subsequent requests (no handshaking).
-
Enables HTTP pipelining of requests and responses.
Setting this to more than 115 probably won't help and will make things worse. See here.
Mozilla networking preferences page lowered it to:
network.http.keep-alive.timeout 30
Would require some reference and/or research regarding what would be the optimal value for this setting.
According to Apache's docs the default value is 5
seconds.
HTTP persistent connection allows...
It's a balance, not just a benefit. The above link explains that too.
According to Apache's docs the default value is 5 seconds.
From a quick testing, I would go with 15. That accounts for one TCP Keep-Alive ACK response from the server:
1 0.000000 XXXXXXXXXXXX → 208.118.235.174 TCP 66 41424 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=128
...
19 10.559493 XXXXXXXXXXXX → 208.118.235.174 TCP 54 [TCP Keep-Alive] 41424 → 443 [ACK] Seq=1312 Ack=4687 Win=43904 Len=0
20 10.690260 208.118.235.174 → XXXXXXXXXXXX TCP 60 [TCP Keep-Alive ACK] 443 → 41424 [ACK] Seq=4687 Ack=1313 Win=17856 Len=0
...
28 15.691863 XXXXXXXXXXXX → 208.118.235.174 TCP 54 41424 → 443 [RST] Seq=1367 Win=0 Len=0
Have you used this setting with your regular browsing? Any undesirable side effects?
Have you used this setting with your regular browsing?
Just a little in Firefox. But I have set it to 15 in TBB which I use more often.
Any undesirable side effects?
No. But I browse the web with JS turned off. Generally I would expect "side effects" in the sense of increased number of connections in a more active browsing scenario (lots of XHRs). I also suppose the more negative effect (memory-wise) may be server side. But the server can terminate the connection regardless of client timeout setting.
Some benchmarks (with this very page):
network.http.keep-alive.timeout == 0
network.http.keep-alive.timeout == 15
network.http.keep-alive.timeout == 115
Testing just any page cannot be a universal measure for anything. There are many other factors influencing page load time.
Testing just any page cannot be a universal measure for anything. There are many other factors influencing page load time.
True. Just wanted to do some quick tests.
The test confirms that default values (Chrome has even higher values than Firefox) aren't optimized.
Chrome has even higher values than Firefox
What are the values for Chrome? Where do you read/set them? (I couldn't find a setting)
Correction: Chrome had a value of 300 seconds, by looking at https://src.chromium.org/ I found:
Wait 45s until sending first TCP keep-alive packet.
Thanks. Do you think you could provide a link to the actual source code? Maybe we can file a request to Chromium for providing a setting.
Can't find the src.chromium page quoted above, but...
setKeepAlive is set as 45 seconds here and it means that:
For Chrome, TCP keep-alive packets are sent every 45 seconds to ensure that the connection stays active.