tribler
tribler copied to clipboard
Understanding the impact of latency on Libtorrent thoughput
Problem: Tor-like tunnels introduce significant latency.
Measure how 25ms to 1000ms of latency affects the Libtorrent throughput. Aim to understand how LEDBAT parameters can be set to maximize throughput. Create an experimental environment using containers. Use netem to create latency between two containers. Start seeder at one container, download across the connecting bridge with added latency.
Initial results: performance is severly efected by latency of just 50 ms.
Current operational measurement scripts: https://github.com/vandenheuvel/libtorrent-latency-benchmark
Measurements of the libtorrent download speed under different latencies with 6 seeders.
This is relevant for achieving onion routing with defence against traffic confirmation attacks.
ToDo for next meet, from Libtorrent API docs + tuning:
- Understand performance tuning and experiment with various settings
- Compile from source
-
session_settings high_performance_seed();
-
outstanding_request_limit_reached
-
send_buffer_watermark_too_low
- Directly add a test for latency sensitivity in Libtorrent existing software emulation suite?
Investigate max_out_request_queue
, and here.
Background http://blog.libtorrent.org/2015/07/slow-start/ and http://blog.libtorrent.org/2011/11/requesting-pieces/ Easy fix: high_performance_seed returns settings optimized for a seed box, serving many peers and that doesn't do any downloading. It has a 128 MB disk cache and has a limit of 400 files in its file pool. It support fast upload rates by allowing large send buffers.
Additional boost: asynchronous disk I/O
Detailed solution : I’m unable to get more than 20Mbps with a single peer on a 140ms RTT link (simulated delay with no packet loss).. Original post
things you could adjust according to Arvid Norberg lead engineer of the libtorrent project.
“Did you increase the socket buffer sizes on both ends?”
int recv_socket_buffer_size;
int send_socket_buffer_size;
“There’s also buffer sizes at the bittorrent level:”
int send_buffer_low_watermark;
int send_buffer_watermark;
int send_buffer_watermark_factor;
“And there are buffers at the disk layer:”
int max_queued_disk_bytes;
int max_queued_disk_bytes_low_watermark;
New test using LXC's. Ten seeding LXC's, one downloading LXC. Single measurement. While high latencies seem to start slower, high latencies seem to do substantially better once transfer speed has stabilized. Latencies up to 150 ms perform, at maximum speed, similar to the base test without any latency. Measurement without any latency is very similar to earlier test using VMs.
A single seeding LXC. Single measurement. Higher latencies impact throughput heavily.
ToDo, try to obtain more resilience against latency in Libtorrent with single seeder, single leecher. Plus read current research on traffic correlation attacks. The basics are covered here. Quote: 'recent stuff is downright scary, like Steven Murdoch's PET 2007 paper about achieving high confidence in a correlation attack despite seeing only 1 in 2000 packets on each side'.
Strange observation that it takes 60 to 100 seconds for speed to pick up. is the seeder side the bottleneck, due to anti-freeriding stuff? Please repeat multiple times and create boxplots.
1 iteration to find the magic bottleneck...
Good news: It appears that the magic bottleneck is identified. Plot of single seeder, single leecher, 200ms latency. No reordering.
Throughput is mostly 15MB/s. Now with doubling the
default
and max
parameters of net.ipv4.tcp_rmem
and net.ipv4.tcp_wmem
(measured in bytes):
We notice that throughput doubles also, to roughly 30MB/s. The bad news, however, is that further increasing these parameters has little effect: most of throughput speeds never pass the 35MB/s.
Also, the inconsistency in these measurements is still unexplained.
good! next bottleneck...
The magic parameter setting are discovered now, resulting in 35MByte/s Libtorrent throughput. Next steps are to advance this throughput in Tribler and Tribler+tor-like circuits.
Please document your LXC containers. ToDo: Tribler 1 seeder, 1 leecher; see influence of blocking SQLite writes on performance...
@qstokkink understands the details of tunnel community...
You will probably want a repeatable experiment using the Gumby framework. You are in luck, I created a fix for the tunnel tests just a few weeks ago: https://github.com/qstokkink/gumby/tree/fix_hiddenseeding . You can use that branch to create your own experiment, extending the hiddenservices_client.py
HiddenServicesClient with your own experiment and your own scenario file (1 seeder, 1 leecher -> see this example for inspiration).
Once you have the experiment running (which is a good first step before you start modifying things - you will probably run into missing packages/libraries etc.), you can edit the TunnelCommunity class in the Tribler project.
If you want to add delay to:
- Intermittent relaying nodes, add delay here
- Exit nodes, add delay here
- The sending node, add delay here
You can combine relaying nodes and the sending node into one by adding delay here (which does not include the exit node)
1 Tribler seed + 1 Tribler leecher with normal Bittorrent with 200ms - 750ms latency is limited by congestion control. Read the details here. To push Tribler towards 35MB/s with added Tor-like relays we probably at some point this year need to see the internal state of the congestion control loop.
ToDo for 2017: measure congestion window (cwnd) statistics during hidden seeding.
We managed to do some new tests. We ran Tribler with the only the http API and libtorrent enabled. We found that the performance of libtorrent within Tribler is significantly worse than plain libtorrent. Below is a summary of results so far. Note that these values for a single seeder single leecher test.
libtorrent | Tribler | |
---|---|---|
no latency | ~160 MB/s | 30 - 100 MB/s |
200 ms | ~15 MB/s | ~2.5 MB/s |
200 ms + magic | ~35 MB/s | ~2.5 MB/s |
Note that "magic" is the increasing of net.ipv4.tcp_rmem
and net.ipv4.tcp_wmem
parameters. It appears that Tribler suffers from a different bottleneck. Note that when testing without latency, speed varies heavily between ~30 MB/s and ~100 MB/s. During the all tests, cpu load was ~25% on 3 cores.
@synctext @devos50 @qstokkink does anyone have any ideas what might cause this?
@vandenheuvel Assuming all libtorrent versions are the same etc.: the only way Tribler interfaces with a libtorrent download is by retrieving its stats every second (handle.stats()
) and by handling alerts.
After writing that I found this in the libtorrent manual:
Note
these calls are potentially expensive and won't scale well with lots of torrents. If you're concerned about performance, consider using post_torrent_updates() instead.
Even though this shouldn't be that bad, you could try and write a loop which gets the torrent handle status every second on your naked experiment and see how that affects things
@vandenheuvel in addition, we are also processing all libtorrent alerts every second but I don't think this leads to much overhead actually. Could you try to disable the alert processing (by commenting out this line: https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentMgr.py#L72)?
Very impressive work guys:
libtorrent | Tribler | |
---|---|---|
no latency | ~160 MB/s | 30 - 100 MB/s |
200 ms | ~15 MB/s | ~2.5 MB/s |
200 ms + magic | ~35 MB/s | ~2.5 MB/s |
Tribler shamefully collapses. Clearly something to dive into! Did the tip of fully disabling the stats+perhaps 5 second stats sampling lead to any results? btw, can you also expand this Tribler pain table
with 10, 25, and 75ms latency data points?
Our test results show a download speed of ~2.5 MB/s at 200 ms with our plain script as well, when we introduce the EPoll reactor into the code. This is similar to the results found in the previous tests with Tribler. However, tests with our plain script and Select reactor shows the original results that we have retrieved before introducing the reactor or even higher: a top speed of 30 MB/s. The next thing on our list is testing Tribler through twisted with the Select reactor.
Interesting. Could you also try the normal Poll reactor (it is supposed to be faster than Select for large socket counts)?
Strange enough, results are quite different between our script and Tribler. Summary:
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
script | 2.5 MB/s | 32 MB/s | 16 MB/s |
Tribler |
2.5 MB/s | 2.5 MB/s | 2.5 MB/s |
It may take a little while for the download to come up to speed (~ 60 seconds), but after that the throughput is quite steady.
Our next step will be profiling.
32MByte / sec. So.. Python3 and 200ms latency results. This facinating mistery deepens. Please make a profile print with human readable threadname printouts
We just ran our script under both python2.7
and python3.5
, this made no difference for the SelectReactor
.
Due to an update for lxc
our test results have changed drastically. The newest test results, using a latency of 200 ms except otherwise mentioned are:
No Reactor without delay |
No Reactor |
|
---|---|---|
Script | ~400 MB/s | ~32 MB/s |
All the below results are created by a modified script as well (200 ms):
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
Inactive | ~32 MB/s | ~16 MB/s | ~16 MB/s |
Semi-Active | ~32 MB/s | ~16 MB/s | ~16 MB/s |
Notes:
- The
EPollReactor
andPollReactor
only reach their speed after a certain period of time.
hmmm. so the lower non-script table is all Tribler?
In the above post, all results are our own script. We retested everything non-Tribler. We're not sure what this change of results (especially the No Reactor without delay
peek performance and EPollReactor
results) tells us about the quality of current testing method... These changes are enormous.
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
Tribler 0 ms |
~100 MB/s | ~130 MB/s | ~80 MB/s |
Tribler 200 ms |
2.5 MB/s | 2.5 MB/s | 2.5 MB/s |
PollReactor
without latency was varying wildly, the other measurements were steady. Sadly these results agree with previous results, before the LXC update. We will now try to bring our script and Tribler closer together by using the reactor thread to start libtorrent in our script.
impressive exploration. Good steps to the final goal: Tribler becomes latency tolerant, add big buffer, and remix anon tunnel packets.
please read
Two Cents for Strong Anonymity: The Anonymous Post-office Protocol "AnonPoP offers strong anonymity against strong, globally eavesdropping adversaries, that may also control multiple AnonPoP’s servers,even allbut-one servers in a mix-cascade."
Todays conclusion: Tribler drops factor 40 in performance when latency is added between single seeder, single downloader (tested with 4GByte file and 200ms latency). Factor 25 drop with pure Libtorrent plus Twisted (no Tribler) when latency is added there.
Tribler magically can start libtorrent from Twisted, however this fails to be reproducible. Feedback from Twisted developer and contact with Arvid: http://twistedmatrix.com/trac/ticket/9044 http://stackoverflow.com/questions/42351473/using-libtorrent-session-with-twisted-stuck-on-checking-resume-data
ToDo: refactor the core.libtorrent package. Next possible step: performance regression suite (#1287, but not now please) Factor 25-40 stuff is essential
@captain-coder remarked that touching Python at all from non-Twisted thread will cause trouble. Possibly the thing you see.
@synctext @Captain-Coder What exactly do you mean by that?
@synctext butchered what I said to him.
He described the problem: as starting an OS thread in process, having that thread talk to libtorrent, and hosting python in the same process to do "stuff". Where I remarked that as soon as the python bindings for libtorrent are loaded into this process (which happens pretty quickly if you import/use/touch Tribler in the hosted python instance) it might induce a situation where python structures are modified, through callbacks/pointers from libtorrent, initiated from the native OS thread. This violates the properties that the python GIL is trying to impose and has a good chance of tripping up the python interpreter (or even worse, silently corrupting it).
But it's also possible synctext described the problem wrong.
@vandenheuvel @MaxVanDeursen I just fixed the problem in your twisted branch. Now the leecher/seeder is no longer stuck in CHECKING_RESUME_DATA. The fix is:
def printSpeed(h):
for alert in ses.pop_alerts():
status = h.status()
print 'seeder', status.state, status.progress, alert
You can set the alerts using session.set_alert_mask
. I guess you should be using alerts until the problem is fixed in libtorrent/twisted.
@egbertbouman Wow... Thanks a lot for that! How did you come to this solution? And do you happen to know why this works?
@MaxVanDeursen You're welcome! Since libtorrent worked in Tribler, I just looked for the differences between the Tribler implementation and yours. I have no idea why this works.
Thanks to @egbertbouman we have some new results.
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
Script, starting session with Twisted 0 ms |
~400 MB/s | ~400 MB/s | ~400 MB/s |
Script, starting session with Twisted 200 ms |
32 MB/s | 16 MB/s | 16 MB/s |
We conclude that it doesn't matter which thread starts the session. We will start rewriting the Core.Libtorrent
package now: this way we can learn more about how Tribler interacts with libtorrent and hopefully discover why Tribler is so slow.
We did some profiling and found that under both the EPollReactor
and the SelectReactor
, close to 100% of the time was spent blocking in the poll
of the reactors. This seemed normal as Tribler was basically idle during these tests (except for downloading using Libtorrent at ~2.5 MB/s).
Yesterday, I spent some time testing Tribler vs libtorrent. I used Ubuntu 16.10 in virtualbox. For libtorrent only I got:
Then, I changed the code in the master branch to work with Tribler and got this:
Finally. I removed the flags keyword parameter from ltsession = lt.session(lt.fingerprint(*fingerprint), flags=1)
, and like magic:
The difference seems to be that Tribler is not loading some of the default libtorrent features (like UPnP, NAT-PMP, LSD). Very strange...
@egbertbouman Thanks for these hopeful results! Unfortunately, we have not been able to reproduce your results on our own system. However, we have noticed that using this flag in our basic script, the throughput of the program is negatively impacted to 2.5MB/s as well. We will look further into why the removal of this flag does not result in a change of throughput for us, although you have shown that this can be done.
so uTP is enabled, but lt.fingerprint(*fingerprint), flags=1 what does that do?
@synctext From the Libtorrent Manual:
If the fingerprint in the first overload is omited, the client will get a default fingerprint stating the version of libtorrent. The fingerprint is a short string that will be used in the peer-id to identify the client and the client's version ... The flags paramater can be used to start default features (upnp & nat-pmp) and default plugins (ut_metadata, ut_pex and smart_ban). The default is to start those things. If you do not want them to start, pass 0 as the flags parameter.
After the results of last week we decided to do exhaustive tests on all combinations of our script and Tribler, as well with and without flags.
Seeder has different script than downloader in table below. Diagonal is equal code. All have 200ms latency.
Leecher \ Seeder | Script NoFlag | Tribler NoFlag | Tribler Flag | Script Flag |
---|---|---|---|---|
Script NoFlag | H | L | H | L |
Tribler NoFlag | L | L | L | L |
Tribler Flag | L | L | L | L |
Script Flag | L | L | L | L |
L(ow): convergence of download speed at the rate of ~2.5MB/s H(igh): convergence of download speed at the rate way above 2.5MB/s (i.e. >10MB/s)
From these results there is nothing in plain sight that we can conclude as far as the usage of this flag goes.
Please consider at some point to open up the magic box and dive into these detailed networking statistics. Something is off...
Key problem: discover why Tribler has 2.5 MByte/sec performance with 200ms seeder-leecher latency, and clean Twisted-wrapped Libtorrent has 16MByte/sec
.
The methodology is to strip the Python wrapping code and use a methodology of stumbling upon success. Need to be realistic and systematically do a brute-force exploration why it doesn't work. Finding the magic fix directly is not realistic at this point. Stripped Libtorrent manager now 40 lines of code, still 2.5 MByte/sec. Dispersy is off in Tribler.
Is it the database that blocks the whole Twisted thread for 25ms regularly?
possible experiment to decide if the bottleneck is within the code or parameter settings or both:
- Tribler stripped Libtorrent manager, which can only resume download
- Libtorrent manager in clean Python, fast
- Pauzed torrent, parameters created by Tribler, slow
- Pauzed torrent, parameters created by Libtorrent clean code, fast
Mix and match both ways to see where the fault lies...
I just read through (most) of this thread. Here are some observations:
-
the libtorrent python bindings never call back into python from within libtorrent. Early versions attempted to do this, by supporting python extensions. This turned to cause subtle memory corruption or GIL deadlocks. So, libtorrent is not supposed to interact with python other than through the direct API calls. (so, if you find anything like that, please report a bug!)
-
libtorrent has built-in instrumentation added specifically to troubleshoot performance issues. In versions before 1.1 it's a build switch (
TORRENT_STATS
), in 1.1 and later the stats are reported as an alert (which is disabled by default). If these alerts are printed to a log, it can be analyzed bytools/parse_session_stats.py
in the libtorrent repo (that script exists pre 1.1 too, but loads the files produced by libtorrent. The gotcha is that the stats files are written to CWD, which needs to be writable by the process). The script requiresgnuplot
and will render a number of graphs and hook them up to an html page. Looking at them may reveal something. -
The meaning of the flags passed to the
session
constructor are defined here. By default both flags are set (add_default_plugins
andstart_default_features
). By passing in 1, you just add the plugins, without starting DHT, UPnP, NAT-PMP and local peer discovery.
I also have some questions:
-
Are you testing with TCP or uTP? The uTP implementation in libtorrent could be made to perform better on linux by using
sendmmsg()
andrecvmmsg()
, but currently it requires a system call per packet sent or received. -
which version of libtorrent are you using? If you're on 1.1.x, one way to have more control over how the session is set up is to use the constructor that takes a
settings_pack
. This lets you setup the configuration before starting the session. -
Is the main issue you're looking at the performance difference between your twisted wrapper and Tribler? Or are you looking at understanding the bottlenecks making the 200ms case have so much lower throughput than the 0 ms case? If the latter, would it be useful to you if I would whip up a one-to-one test transfer simulation with various latencies?
@arvidn Thanks a lot for these points! You can have a look at our code here.
We're using the latest version in the Ubuntu repositories. Note that we're only trying to explain the difference in performance between Tribler and our script. Thus, calls like sendmmsg()
and recvmmsg()
don't seem that relevant as we're not doing these calls (explicitly) in our fast script.
Using settings_pack
's might be helpful, but right now we're trying to replicate Tribler's settings in our script to achieve the same performance. @MaxVanDeursen maybe we should just refactor Tribler's code to use settings_pack
's as a first libtorrent refactor step and hope that performance improves?
Yes, we're looking to explain this difference under the condition of 200 ms, as the difference becomes larger (from 4x to 8x using default settings) with respect to the 0 ms case. This latency (and above) is relevant for tunnel community
performance, as we suspect this is the current bottleneck.
For our script (at 200 ms), we already know what the bottleneck is: buffer sizes (see above). Right now, we believe it is most probable that this has to do with settings passed to libtorrent either in creating the libtorrent session or adding a torrent.
Report can be found at overleaf.
do you have any (verbose) libtorrent logs from those test runs where the download did not start for a long time, or did not reach the max throughput for a long time? I would be interested in looking at them in that case.
also, if you're not using a sufficiently recent version of libtorrent, you may be experiencing this.
@arvidn Sadly, we do not have any libtorrent logs whatsoever of this occurrence. However, if you would like to reproduce the error, you could look into our libtorrent latency benchmark before the pop_alert fix was implemented in order to get the logs.
afaik we used the last libtorrent version
Golden experiment:
- Select 1 torrent for this experiment
- Create Pauzed torrent, parameters created by Tribler, slow
- Create Pauzed torrent, parameters created by Libtorrent clean code, fast
- Now take a pauzed Tribler torrent, and resume it within the potentially fast Libtorrent clean codebase.
- Magic happens: is it a setting problem or not?
- Other way around, will Tribler slow down pauzed fast torrents?
Assign me!
@ichorid Good luck! If you have any questions, please reach out to either @MaxVanDeursen or me.
Mental note..one month spend on making unit tests faster would most likely be well spend..potential next optimization task.
@synctext I assume you refer to our 'heavy' tests to test the correct working of anonymous downloading/hidden seeding (since most of our other unit tests are very fast)? I don't see any simple optimisations for these tests since the tunnel community relies heavily on Dispersy and the testing framework of Dispersy is a pain to work with and very slow. Instead, I would wait for IPV8 to be integrated in Tribler and use the IPV8 functionality to test the tunnel community. And create a stable integration test of anonymous downloading/hidden seeding. IIRC, @qstokkink already has a few tests for the tunnel community in IPV8 that run within milliseconds but don't pin me on that.
Thank you for that vital info. post-ipv8 then..So we can work towards 60Second full unit tests runtime...
Correct: all of the tunnel unit tests (including the full end-2-end hidden seeding test) are completed within 2 seconds.
This is not only a problem of performance, but bad code quality. The download settings and their defaults are not defined in a single place. The statement is that this bug could have been found in a day, with good code. Instead, we have technical depth.
Start of rewrite by Bram: no longer contains this silly LaunchManyCores things
Vadim status:
- gladly takes the job of making beautiful code from 12 year old code which wraps Libtorrent
- latest stage not in github yet
- patches done to original container.sh etc matters
- we have already a faster way to run these XLC container tests now.
- XLC container template, one for leechers and seeders
- feature to avoid long time to setup LXC, use snapshot; instead of setup everything from the sources (buggy, but fixed)
- 200ms latency with clean Libtorrent reproduced
Unstable download speeds with libtorrent:
When you run LXC-based latency tests for a single latency repeatedly, i.e. by setting
latencies = [0 for x in range(numIntervals)]
or by completely commenting out latency setting lines (tc ...
), you don't get repeatable results. The first test run is always much faster than the subsequent ones, but sometimes you could randomly get high results in one of those too.
This means several things:
- Our previous measurements were wrong, as they were affected by this.
- There is a bug/unstable behaviour in either one or several of: 2.1 Ubuntu 17.10 networking; 2.2 LXC 2.1.0; 2.3 python3-libtorrent 1.0.7-1build1 (libtorrent-rasterbar8); 2.4 Our setup/handling of libtorrent session;
- There could be potential to greatly speed up downloads in tribler and other libtorrent-based clients.
As this could have serious consequences, I invite any interested parties to recheck my results, just in case ;-)
(Experiment conditions: host Ubuntu 17.10; LXC containers Ubuntu 16.04 (default))
The same plot (libtorrent) for 200ms:
Does this behavior persist with the latest version of LibTorrent (1.1.7 I believe)?
do these tests pin down whether TCP or uTP is used? One aspect I can imagine might affect this, is if the startup of the swarm is somewhat random, the first connection attempt (over uTP, which is preferred) fails sometimes and sometimes suceeds. With a failed uTP connection attempt, libtorrent will try TCP. Hence, unless the test explicitly disables one or the other, the protocol that ends up being used is probably somewhat random.
If I would be interested in exploring whether this variance is primarily caused by libtorrent, how hard would it be for me to reproduce these tests? is there a guide somewhere?
it's possible libtorrent-1.0.x is too old, but at least 1.1 and later has session stats counters that can be reported via an alert, enabled by the alert mask. Logging this for the peers could give a hint of what libtorrent is experiencing.
are these tests one-to-one transfers, or is it a single swarm of 10 peers? (if it's one-to-one, the logging should be manageable).
Thnx Arvid for this again insightful comment. Its really a moral boost for the team that you're helping us out.
V: Please link your XLC scripts here for reproduction.
@arvidn, the tests use default configuration of libtorrent, and the version is that available in the Ubuntu LXC container. Thanks to @vandenheuvel and guys it is very easy to reproduce the experiment. You could use Ubuntu VM, or try to run it in your environment. The whole test should be run as root. In Ubuntu you could do just something like:
sudo su apt-get install lxc ctorrent python3-numpy python3-matplotlib cd /tmp git clone https://github.com/vandenheuvel/libtorrent-latency-benchmark cd libtorrent-latency-benchmark ./main.sh open ./result.png
And if it works, you could change the latency setting in leecher.py to
latencies = [0 for x in range(numIntervals)]
(the plot legend would still show 50ms steps, don't take that into account).
My next step would be to test the thing with the newest libtorrent version. To do that, I would first need to complete a small refactoring of these tests.
As I am not very familiar with libtorrent, could you please provide a "UDP-KISS" set of settings, that would turn off all adaptive features of libtorrent, like congestion control, auto-switching between UDP and TCP, etc.?
I haven't gotten around to setting up an ubuntu VM to run this in. But I have some comments anyway:
-
If
netem
(thetc
command) works on the loopback device, your setup could be simplified to not require any containers. You could just set the latency onlo0
and make each instance bind to IPs 127.0.0.1 - 127.x.x.x, however many nodes you're running. The main benefit would be to shave off overhead and moving parts, to make sure you're not just measuring the performance (or variance) of the virtualization layer. If this approach fails, you could still make them all bind to 127.0.0.1, and make sure to enable theallow_multiple_connections_per_ip
setting. -
In the loop that collects data during a transfer, you log
download_rate
fromtorrent_status
. You would get better accuracy by measuring thetotal_download
[1] instead. The download rate is fuzzy, it's basically the average download rate from the last few seconds. If you take a rate sample every second, they won't necessarily sum to the total downloaded bytes. Instead capture the downloaded bytes together with a timestamp. -
To disable uTP, set
enable_outgoing_utp
andenable_incoming_utp
[2] to false. -
To disable DHT, Local peer discovery, UPnP, NAT-PMP, peer exchange, ut_metadata (magnet links) and smart_ban, you can set the flags argument to the session constructor to 0:
session(flags=0)
(at least I think that works in python). -
some settings that are relevant for a high bandwidth-delay-product test are the
send_buffer_*
[3] settings. -
If you suspect the cost of disk reads/writes start to dominate the test (which I would expect to be the case, as long as everything else works as it should). You may want to disable disk I/O. You can do that by setting
disable_hash_checks
[4] to true, and... hm. there's no way to set theadd_torrent_params::storage
todisabled_storage_constructor
from python.. Anyway, with disk I/O disabled, you don't really need the underlying data for the torrent either, you can just pretend to have it by setting one torrent toseed_mode
(inadd_torrent_params
)
No need for containers even.. All sounds like solid suggestions to work on. @egbertbouman Can you also help out if needed today.
@arvidn, thanks for suggesstions! Will now try to play with the settings and report back as soon as something interesting turns up.
Starting libtorrent session with
session(flaqs = 0) enable_incoming_tcp = 0 enable_outgoing_tcp = 0
at 0ms latency produces relatively stable results that max out at 80 Mb/s:
When running without flags = 0
, like
session() enable_incoming_tcp = 0 enable_outgoing_tcp = 0
the first download is 2 times faster than the remaining ones, maxing out at 150 Mb/s, and there is a pattern of "fast" downloads with slowly capping top speeds:
For 100ms with flags=0
this effect produces semi-stable 5Mb/s:
But without flags=0
it nets a first download with immediate 30 Mb/s, and then for all remaining downloads it first drops to semi-stable 5 Mb/s, and after around 60 seconds of this magically ramps up 30 Mb/s:
(The settings were set equal on both leecher and seeder.)
Even with flags=0
enabled, after the first download, the sessions sometimes become "stuck" at random for a brief moment. Apparently, this is the effect that produces "waves" in the plot.
To me, it seems like something related to congestion control, or memory management. Previous experiments with Tribler-based downloads showed a similiar pattern, but without ramping up to full speed. I suspect that when Tribler meets latency, some congestion-handling feature in libtorrent just doesn't fire up.
@arvidn, what kind of feature in libtorrent could cause such performance spike/waves? (I am using python3-libtorrent 1.0.7-1build1 (libtorrent-rasterbar8) from Ubuntu 17.10)
the stall in the last run looks like they got stuck checking files or something. and if the first connection attempt fails, there may be a delay before trying again.
I would like to stress again that download_rate
is a lie. The shapes of your graphs suggests that that's what your plotting. I would highly recommend measuring the time and total_download
instead. instead of sleeping for one second and then assume it's been a second, check the time again and print it to your csv file for the plot. when you run benchmarks where the computer is very busy, your delays are increasingly likely to be longer than you ask for.
The best way of knowing for sure is to enable and log the session_stats
counters (there's a tool parse_session_stats.py
to generate plots from it too)
Here are the plots of download progress produced with the method suggested by @arvidn (total_download
with timestamps). The charts are for the same experiments as in the previous series:
0 ms session(flaqs = 0)
0 ms session()
100 ms session(flags = 0)
100 ms session()
The slope of the lines corresponds to download speed. We can still see the basic problem pattern from the previous plots. In addition, the small freezes become apparent. Please mind the automatic scaling done by the pyplot (i.e. the horizontal and vertical scale are different for each plot).
Facinating. To efficiently zoom into a solution for Tribler issues, please do experiments with 200ms latency and latest binary released.deb package of Libtorrent. Having 80MByte/sec would be awesome. We have 2.5 MByte/sec merely.
Clean Twisted-wrapped Libtorrent has 16 MByte/sec. You are very likely seeing another performance bottleneck then Tribler.
@arvidn, now I'm working on adding proper Tribler-based tests to this experiment. If Tribler-based results at 200ms would match the libtorrent with flags=0
, that would mean some magic extension in libtorrent that allows it to show good performance under big latencies does not fire up within Tribler. This fact would let me narrow it down to a single flag or combination of them in libtorrent-based test alone, to trace it in Tribler later.
In addition, that would mean something in core libtorrent and/or python bindings produces an unexpected and unnecessary long delay before the lib could again reach the top performance.
(I'm going to do the tests with latest stable libtorrent built from source (1.6))
great, 1.1.6 is good. A more direct approach to understanding a throughput bottleneck would be to post session stats counters regularly and log them to a file. That's basically what they're for. experimenting with various flags is also a good idea, especially to discover unexpected side-effects. But I would primarily imagine you're seeing second or third order effects of the flags.
For example, tracker announces may be held off until UPnP or NAT-PMP has succeeded, in order to get an accurate external port to announce, if neither of those work, there may be some unnecessary delays on startup. I don't think this specific issue affect you, since you don't rely on a tracker, but there may be other effects of that nature you're seeing the effects of.
@arvidn, basically, how does python bindings for libtorrent work? What does and what does not kept between libtorrent session
s? And how could I be sure that the underlying objects are really deleted, sockets closed, etc.?
For example, when I create a libtorrent session object in Python:
import libtorrent as lt
def mkses():
ses = lt.session()
sleep (5)
print ses
for i in range(3):
mkses()
what state these three session objects would share/inherit from predecessors?
each session object you construct is independent. There's no invisible state they share (other than the usual, filesystem, network interfaces etc.)
They will all open listen sockets independently, two will fail to bind because the third got the port first, they may bump the port and try again and so on (that's configurable and enabled by default).
about when they are destructed, I believe that's up to the python GC. Aren't objects reference counted in python though? so you should get consistent and reliable destruction. otherwise you may be able to use del
.
did I understand you question?
I also found the strange pauses in throughput today when analyzing the tunnel integration tests with @devos50. These are the outputs of three seperate runs sending a 2.06 MB file, plotting all the libtorrent data going through the tunnels:
In the last run there was no throughput pause at all. Hope this helps @ichorid 👍
@arvidn, yes, your answer was exactly about the things I was interested in. Thank you 👍 .
Now, the preliminary results from testing Tribler-based download. The leecher used a single session of Tribler, and the seeder was libtorrent-based. Distro: Ubuntu 17.10 (lxc) libtorrent: 1.1.6 (stable source tarball from official site) tribler: 7.0 (stable source tarball from official site)
Tribler config lines:
config = SessionStartupConfig()
config.set_state_dir(os.path.join(os.getcwd(), '.tribler'))
config.set_torrent_checking(False)
config.set_multicast_local_peer_discovery(False)
config.set_megacache(False)
config.set_dispersy(False)
config.set_mainline_dht(False)
config.set_torrent_store(False)
config.set_enable_torrent_search(False)
config.set_enable_channel_search(False)
config.set_torrent_collecting(False)
config.set_libtorrent(True)
config.set_dht_torrent_collecting(False)
config.set_videoserver_enabled(False)
config.set_enable_metadata(False)
config.set_http_api_enabled(False)
config.set_tunnel_community_enabled(False)
config.set_creditmining_enable(False)
config.set_enable_multichain(False)
seeder config (libtorrent-based, flags=0
):
ses = lt.session(flags=0)
ses.enable_incoming_tcp=0
ses.enable_outgoing_tcp=0
0 ms
100 ms
We see roughly the same performance as from the pure libtorrent with flags=0
. Therefore, we could assume the flags to be the main suspect. Another important thing to note is that Tribler session sometimes introduces an additional delay of 60 seconds. This looks similar to the pattern of libtorrent getting up to speed after 60 seconds that we've seen in previous tests for 100ms with default flags. This gives us a hint that some subsystem in libtorrent just does not start properly in Tribler and is sub-optimally initialized in pure libtorrent-based tests.
Next, I will check the "seeder vs leecher" problem. After that I will commit the tests framework to Github and continue the investigation by scoring the effects of individual flags.
@ichorid LibtorrentDownloadImpl will wait with downloading a torrent until the session has found more then 25 DHT nodes (see https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentDownloadImpl.py#L260 and https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentMgr.py#L567). This was introduced as a fix for a previous libtorrent version, but perhaps it's not needed anymore. Anyway, this could be the cause of additional delays when dealing with larger latencies.
When downloading with Tribler at 100ms, changing flags=0
at seeder side does not change the results of the experiment. The delay is still there, speed is still at flags=0
libtorrent baseline.
Comparison of logs from delayed iterations of tests with undelayed ones showed that delayed ones always produce the following lines:
DEBUG:LibtorrentMgr:LibtorrentMgr: could not find torrent 25283068c8e1470dd82db03994e9b4e7f59087c3
DEBUG:LibtorrentMgr:LibtorrentMgr: could not find torrent 25283068c8e1470dd82db03994e9b4e7f59087c3
Adding sleep (10)
between iterations to avoid race condition in libtorrent on session.remove_download(..)
resulted in logs without ... could not find torrent ...
and plots void of random 60s delay:
Apparently, there is a race condition in tribler and/or libtorrent on session.remove_download()
. Further checks with pure libtorrent will follow soon.
remove_torrent() is an async. Call. It’s not obvious though how add_torrent() could jump ahead of it
Oh, maybe it’s the other peer trying to connect before it’s been added. Sync. With add_torrent_alert.
Forked libtorrent-latency-benchmark and added Tribler support, ability to switch between Tribler/libtorrent modes, remove_torrent workaround, compiling libtorrent from source, etc. libtorrent-latency-benchmark:tribler
@arvidn, indeed, the libtorrent version actually matters for this problem.
The following plot is for libtorrent 1.0.7 as found in Ubuntu 17.10:
And this one is for libtorrent 1.1.6 compiled from source (the same Ubuntu 17.10):
(The flags option was left to default, the session was restarted for each iteration, and sleep(10) was put between iterations, just in case.)
This means that the latest libtorrent does not get up to high-speed mode except for the first run.
Setting flags=0
on either side of connection results in "slow mode" download.
Now I'm going to investigate the flags effects on vanilla libtorrent benchmark.
It's LSD extension. If it is enabled, libtorrent gets to high speeds on high-latency connections. I have no idea how it could relate to latencies, but 1 minute interval before it kicks in is consistent with LSD specifications. I'll do more realistic experiments to confirm this.
OK, I think I got it. The problem is with uTP protocol. When LSD announces, libtorrent switches to TCP (@arvidn, is it really like it should be?) instead of uTP, and becomes much faster. This is because TCP does not care about latencies. And synthetic latencies introduced by netem does not play well with uTP congestion algorithms. Basically, uTP underutilizes the link.
So, in short: in synthetic test environment uTP=slow, TCP=fast.
And TCP-based transport kicks in as a result of LSD announcement (bug?).
enable_incoming/outcoming_udp/tcp
flags do not seem to have any effect on this behaviour.
Whether TCP would provide better speeds in real life high-latency environment remains to be investigated.
@arvidn, how can I control uTP vs TCP usage for individual connections and/or sessions? Is it possible to force the usage of either protocol?
@ichorid nice find! Could you do something with these settings? https://github.com/Tribler/tribler/blob/next/Tribler/Core/Libtorrent/LibtorrentMgr.py#L139
@arvidn, I suspect this strange behaviour we observe in our tests is the result of three distinct bugs in libtorrent (and/or Python bindings):
- uTP has much worse performance than TCP on high-latency, high-bandwidth links.
- If leecher gets LSD announcement from seeder, it would start the download using TCP. And even if it already is downloading this torrent via uTP, it would switch to TCP.
-
enable_incoming/outcoming_udp/tcp
flags doesn't work.
I would try to re-check the results, to be sure.
If netem (the tc command) works on the loopback device, your setup could be simplified to not require any containers. You could just set the latency on lo0 and make each instance bind to IPs 127.0.0.1 - 127.x.x.x, however many nodes you're running.
Please replace the containers with loopback idea and boost the send_buffer settings. Try a Tribler-to-Tribler test to see if there is low-hanging fruit to boost our 4 MByte/sec sad performance. Try minimal example where uTP fails to perform OK currently.
With 50MByte/sec and 100ms one-way latency we always have 10 MByte of un-acknowledged data in-flight. Buffers need to be huge. Look at various buffer settings send_socket_buffer_size
. Any tips from Sweden?
My bad, I was not setting the settings the right way ((
I'll redo the enable_incoming/outcoming_udp/tcp
in a pinch.
So yeah, enable_incoming/outcoming_udp/tcp
flags work. I was wrong on that one. Please, accept my apologies, @arvidn.
I reproduced the uTP vs TCP performance problem using libtorrent's examples/client_test
.
TCP | uTP | |
---|---|---|
0 ms | 310 Mb/s | 104 Mb/s |
100 ms | 30 Mb/s | 5.2 Mb/s |
The tests were run on loopback device, 100ms round-trip latency was added with netem:
tc qdisc add dev lo root netem delay 50ms
To force TCP/uTP on both ends I used -J|-y
options.
# in the first terminal
./client_test -b 127.0.0.1 ./test.torrent [-J|-y]
# in the second terminal
./client_test -r 127.0.0.1:6881 [-J|-y]
I have a suspicion that the first result is CPU bound. UDP is significantly less efficient than TCP at high rates, since (traditionally [1]) every single packet needs a system call to be sent and a system call to be received. Since a packet is ~1400 bytes, that ends up being a lot of syscalls, as you can imagine. This can be mitigated somewhat by detecting that the MTU over loopback is actually 64 kiB and send large packets. I had some code to do that for a while, but as I recall, it was a bit flaky. I removed it thinking transferring over loopback isn't an important use case anyway.
I think the second result, with 100ms delay, is much more interesting from the point of view of looking at the effect of real-world transfers over high latency networks. I suspect that in that case, the rate is limited by the send- and receive- buffers configured for libtorrent. There's a setting called send_buffer_watermark
which defaults to 500 kiB. Try setting that higher, to 1 or 2 MiB.
You may also want to increase send_socket_buffer_size
and recv_socket_buffer_size
. They cas the congestion window and advertised receive window in uTP (iirc). They probably need to be at least 1 MiB too.
[1] there is sendmmsg() and recvmmsg() on linux nowadays, and I've been meaning to add support for that to the uTP implementation, but I haven't gotten to it yet (patches are welcome! :) )
Increasing send_buffer_watermark
, send_socket_buffer_size
, and recv_socket_buffer_size
to 2MiB does not seem to have any effect (or I am doing it wrong). However I added set_utp_stream_logging(1)
to client_test.cpp
, and #define TORRENT_UTP_LOG_ENABLE
to utp_stream.cpp/hpp
.
Next I ran the loopback benchmark again, with 100ms round-trip latency. For both seeder and leecher, I parsed the resulting utp.log
with parse_utp_log.py
, as @arvidn suggested earlier. I hope the produced plots could shed some light on this mystery:
Seeder:
Leecher: