tribler icon indicating copy to clipboard operation
tribler copied to clipboard

Understanding the impact of latency on Libtorrent thoughput

Open synctext opened this issue 8 years ago • 190 comments

Problem: Tor-like tunnels introduce significant latency.

Measure how 25ms to 1000ms of latency affects the Libtorrent throughput. Aim to understand how LEDBAT parameters can be set to maximize throughput. Create an experimental environment using containers. Use netem to create latency between two containers. Start seeder at one container, download across the connecting bridge with added latency.

Initial results: performance is severly efected by latency of just 50 ms.

synctext avatar Nov 15 '16 16:11 synctext

Current operational measurement scripts: https://github.com/vandenheuvel/libtorrent-latency-benchmark

synctext avatar Nov 15 '16 16:11 synctext

averages

Measurements of the libtorrent download speed under different latencies with 6 seeders.

MaxVanDeursen avatar Nov 15 '16 16:11 MaxVanDeursen

This is relevant for achieving onion routing with defence against traffic confirmation attacks.

vandenheuvel avatar Nov 15 '16 16:11 vandenheuvel

ToDo for next meet, from Libtorrent API docs + tuning:

  • Understand performance tuning and experiment with various settings
  • Compile from source
  • session_settings high_performance_seed();
  • outstanding_request_limit_reached
  • send_buffer_watermark_too_low
  • Directly add a test for latency sensitivity in Libtorrent existing software emulation suite?

synctext avatar Nov 15 '16 16:11 synctext

Investigate max_out_request_queue, and here.

vandenheuvel avatar Nov 15 '16 16:11 vandenheuvel

Background http://blog.libtorrent.org/2015/07/slow-start/ and http://blog.libtorrent.org/2011/11/requesting-pieces/ Easy fix: high_performance_seed returns settings optimized for a seed box, serving many peers and that doesn't do any downloading. It has a 128 MB disk cache and has a limit of 400 files in its file pool. It support fast upload rates by allowing large send buffers.

Additional boost: asynchronous disk I/O

Detailed solution : I’m unable to get more than 20Mbps with a single peer on a 140ms RTT link (simulated delay with no packet loss).. Original post

things you could adjust according to Arvid Norberg lead engineer of the libtorrent project.

“Did you increase the socket buffer sizes on both ends?”

int recv_socket_buffer_size;
int send_socket_buffer_size;
“There’s also buffer sizes at the bittorrent level:”

int send_buffer_low_watermark;
int send_buffer_watermark;
int send_buffer_watermark_factor;
“And there are buffers at the disk layer:”

int max_queued_disk_bytes;
int max_queued_disk_bytes_low_watermark;

synctext avatar Nov 16 '16 14:11 synctext

New test using LXC's. Ten seeding LXC's, one downloading LXC. Single measurement. While high latencies seem to start slower, high latencies seem to do substantially better once transfer speed has stabilized. Latencies up to 150 ms perform, at maximum speed, similar to the base test without any latency. Measurement without any latency is very similar to earlier test using VMs. result

vandenheuvel avatar Nov 24 '16 00:11 vandenheuvel

A single seeding LXC. Single measurement. Higher latencies impact throughput heavily. 1

vandenheuvel avatar Dec 01 '16 16:12 vandenheuvel

ToDo, try to obtain more resilience against latency in Libtorrent with single seeder, single leecher. Plus read current research on traffic correlation attacks. The basics are covered here. Quote: 'recent stuff is downright scary, like Steven Murdoch's PET 2007 paper about achieving high confidence in a correlation attack despite seeing only 1 in 2000 packets on each side'.

synctext avatar Dec 01 '16 16:12 synctext

Strange observation that it takes 60 to 100 seconds for speed to pick up. is the seeder side the bottleneck, due to anti-freeriding stuff? Please repeat multiple times and create boxplots.

1 iteration to find the magic bottleneck...

synctext avatar Dec 15 '16 16:12 synctext

Good news: It appears that the magic bottleneck is identified. Plot of single seeder, single leecher, 200ms latency. No reordering. default Throughput is mostly 15MB/s. Now with doubling the default and max parameters of net.ipv4.tcp_rmem and net.ipv4.tcp_wmem (measured in bytes): double_default_max_plain We notice that throughput doubles also, to roughly 30MB/s. The bad news, however, is that further increasing these parameters has little effect: most of throughput speeds never pass the 35MB/s. Also, the inconsistency in these measurements is still unexplained.

vandenheuvel avatar Dec 15 '16 21:12 vandenheuvel

good! next bottleneck...

synctext avatar Dec 15 '16 21:12 synctext

The magic parameter setting are discovered now, resulting in 35MByte/s Libtorrent throughput. Next steps are to advance this throughput in Tribler and Tribler+tor-like circuits.

20170110_144810

Please document your LXC containers. ToDo: Tribler 1 seeder, 1 leecher; see influence of blocking SQLite writes on performance...

@qstokkink understands the details of tunnel community...

synctext avatar Jan 10 '17 13:01 synctext

You will probably want a repeatable experiment using the Gumby framework. You are in luck, I created a fix for the tunnel tests just a few weeks ago: https://github.com/qstokkink/gumby/tree/fix_hiddenseeding . You can use that branch to create your own experiment, extending the hiddenservices_client.py HiddenServicesClient with your own experiment and your own scenario file (1 seeder, 1 leecher -> see this example for inspiration).

Once you have the experiment running (which is a good first step before you start modifying things - you will probably run into missing packages/libraries etc.), you can edit the TunnelCommunity class in the Tribler project.

If you want to add delay to:

  • Intermittent relaying nodes, add delay here
  • Exit nodes, add delay here
  • The sending node, add delay here

You can combine relaying nodes and the sending node into one by adding delay here (which does not include the exit node)

qstokkink avatar Jan 10 '17 16:01 qstokkink

1 Tribler seed + 1 Tribler leecher with normal Bittorrent with 200ms - 750ms latency is limited by congestion control. Read the details here. To push Tribler towards 35MB/s with added Tor-like relays we probably at some point this year need to see the internal state of the congestion control loop.

ToDo for 2017: measure congestion window (cwnd) statistics during hidden seeding.

synctext avatar Jan 13 '17 10:01 synctext

We managed to do some new tests. We ran Tribler with the only the http API and libtorrent enabled. We found that the performance of libtorrent within Tribler is significantly worse than plain libtorrent. Below is a summary of results so far. Note that these values for a single seeder single leecher test.

libtorrent Tribler
no latency ~160 MB/s 30 - 100 MB/s
200 ms ~15 MB/s ~2.5 MB/s
200 ms + magic ~35 MB/s ~2.5 MB/s

Note that "magic" is the increasing of net.ipv4.tcp_rmem and net.ipv4.tcp_wmem parameters. It appears that Tribler suffers from a different bottleneck. Note that when testing without latency, speed varies heavily between ~30 MB/s and ~100 MB/s. During the all tests, cpu load was ~25% on 3 cores. zerolatency4gb

200mslatency4gbmagic @synctext @devos50 @qstokkink does anyone have any ideas what might cause this?

vandenheuvel avatar Jan 25 '17 20:01 vandenheuvel

@vandenheuvel Assuming all libtorrent versions are the same etc.: the only way Tribler interfaces with a libtorrent download is by retrieving its stats every second (handle.stats()) and by handling alerts.

After writing that I found this in the libtorrent manual:

Note

these calls are potentially expensive and won't scale well with lots of torrents. If you're concerned about performance, consider using post_torrent_updates() instead.

Even though this shouldn't be that bad, you could try and write a loop which gets the torrent handle status every second on your naked experiment and see how that affects things

qstokkink avatar Jan 26 '17 08:01 qstokkink

@vandenheuvel in addition, we are also processing all libtorrent alerts every second but I don't think this leads to much overhead actually. Could you try to disable the alert processing (by commenting out this line: https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentMgr.py#L72)?

devos50 avatar Jan 26 '17 09:01 devos50

Very impressive work guys:

libtorrent Tribler
no latency ~160 MB/s 30 - 100 MB/s
200 ms ~15 MB/s ~2.5 MB/s
200 ms + magic ~35 MB/s ~2.5 MB/s

Tribler shamefully collapses. Clearly something to dive into! Did the tip of fully disabling the stats+perhaps 5 second stats sampling lead to any results? btw, can you also expand this Tribler pain table with 10, 25, and 75ms latency data points?

synctext avatar Feb 01 '17 13:02 synctext

Our test results show a download speed of ~2.5 MB/s at 200 ms with our plain script as well, when we introduce the EPoll reactor into the code. This is similar to the results found in the previous tests with Tribler. However, tests with our plain script and Select reactor shows the original results that we have retrieved before introducing the reactor or even higher: a top speed of 30 MB/s. The next thing on our list is testing Tribler through twisted with the Select reactor.

MaxVanDeursen avatar Feb 13 '17 14:02 MaxVanDeursen

Interesting. Could you also try the normal Poll reactor (it is supposed to be faster than Select for large socket counts)?

qstokkink avatar Feb 13 '17 18:02 qstokkink

Strange enough, results are quite different between our script and Tribler. Summary:

EPollReactor SelectReactor PollReactor
script 2.5 MB/s 32 MB/s 16 MB/s
Tribler 2.5 MB/s 2.5 MB/s 2.5 MB/s

It may take a little while for the download to come up to speed (~ 60 seconds), but after that the throughput is quite steady.

Our next step will be profiling.

vandenheuvel avatar Feb 13 '17 20:02 vandenheuvel

32MByte / sec. So.. Python3 and 200ms latency results. This facinating mistery deepens. Please make a profile print with human readable threadname printouts

synctext avatar Feb 14 '17 09:02 synctext

We just ran our script under both python2.7 and python3.5, this made no difference for the SelectReactor.

vandenheuvel avatar Feb 14 '17 10:02 vandenheuvel

Due to an update for lxc our test results have changed drastically. The newest test results, using a latency of 200 ms except otherwise mentioned are:

No Reactor without delay No Reactor
Script ~400 MB/s ~32 MB/s

All the below results are created by a modified script as well (200 ms):

EPollReactor SelectReactor PollReactor
Inactive ~32 MB/s ~16 MB/s ~16 MB/s
Semi-Active ~32 MB/s ~16 MB/s ~16 MB/s

Notes:

  • The EPollReactor and PollReactor only reach their speed after a certain period of time.

MaxVanDeursen avatar Feb 15 '17 20:02 MaxVanDeursen

hmmm. so the lower non-script table is all Tribler?

synctext avatar Feb 15 '17 22:02 synctext

In the above post, all results are our own script. We retested everything non-Tribler. We're not sure what this change of results (especially the No Reactor without delay peek performance and EPollReactor results) tells us about the quality of current testing method... These changes are enormous.

vandenheuvel avatar Feb 16 '17 21:02 vandenheuvel

EPollReactor SelectReactor PollReactor
Tribler 0 ms ~100 MB/s ~130 MB/s ~80 MB/s
Tribler 200 ms 2.5 MB/s 2.5 MB/s 2.5 MB/s

PollReactor without latency was varying wildly, the other measurements were steady. Sadly these results agree with previous results, before the LXC update. We will now try to bring our script and Tribler closer together by using the reactor thread to start libtorrent in our script.

vandenheuvel avatar Feb 18 '17 12:02 vandenheuvel

impressive exploration. Good steps to the final goal: Tribler becomes latency tolerant, add big buffer, and remix anon tunnel packets.

synctext avatar Feb 18 '17 13:02 synctext

please read

Two Cents for Strong Anonymity: The Anonymous Post-office Protocol "AnonPoP offers strong anonymity against strong, globally eavesdropping adversaries, that may also control multiple AnonPoP’s servers,even allbut-one servers in a mix-cascade."

synctext avatar Feb 28 '17 15:02 synctext

Todays conclusion: Tribler drops factor 40 in performance when latency is added between single seeder, single downloader (tested with 4GByte file and 200ms latency). Factor 25 drop with pure Libtorrent plus Twisted (no Tribler) when latency is added there.

Tribler magically can start libtorrent from Twisted, however this fails to be reproducible. Feedback from Twisted developer and contact with Arvid: http://twistedmatrix.com/trac/ticket/9044 http://stackoverflow.com/questions/42351473/using-libtorrent-session-with-twisted-stuck-on-checking-resume-data

ToDo: refactor the core.libtorrent package. Next possible step: performance regression suite (#1287, but not now please) Factor 25-40 stuff is essential

synctext avatar Mar 02 '17 12:03 synctext

@captain-coder remarked that touching Python at all from non-Twisted thread will cause trouble. Possibly the thing you see.

synctext avatar Mar 02 '17 18:03 synctext

@synctext @Captain-Coder What exactly do you mean by that?

vandenheuvel avatar Mar 02 '17 18:03 vandenheuvel

@synctext butchered what I said to him.

He described the problem: as starting an OS thread in process, having that thread talk to libtorrent, and hosting python in the same process to do "stuff". Where I remarked that as soon as the python bindings for libtorrent are loaded into this process (which happens pretty quickly if you import/use/touch Tribler in the hosted python instance) it might induce a situation where python structures are modified, through callbacks/pointers from libtorrent, initiated from the native OS thread. This violates the properties that the python GIL is trying to impose and has a good chance of tripping up the python interpreter (or even worse, silently corrupting it).

But it's also possible synctext described the problem wrong.

Captain-Coder avatar Mar 03 '17 15:03 Captain-Coder

@vandenheuvel @MaxVanDeursen I just fixed the problem in your twisted branch. Now the leecher/seeder is no longer stuck in CHECKING_RESUME_DATA. The fix is:

    def printSpeed(h):
        for alert in ses.pop_alerts():
            status = h.status()
            print 'seeder', status.state, status.progress, alert

You can set the alerts using session.set_alert_mask. I guess you should be using alerts until the problem is fixed in libtorrent/twisted.

egbertbouman avatar Mar 03 '17 15:03 egbertbouman

@egbertbouman Wow... Thanks a lot for that! How did you come to this solution? And do you happen to know why this works?

MaxVanDeursen avatar Mar 03 '17 16:03 MaxVanDeursen

@MaxVanDeursen You're welcome! Since libtorrent worked in Tribler, I just looked for the differences between the Tribler implementation and yours. I have no idea why this works.

egbertbouman avatar Mar 06 '17 09:03 egbertbouman

Thanks to @egbertbouman we have some new results.

EPollReactor SelectReactor PollReactor
Script, starting session with Twisted 0 ms ~400 MB/s ~400 MB/s ~400 MB/s
Script, starting session with Twisted 200 ms 32 MB/s 16 MB/s 16 MB/s

We conclude that it doesn't matter which thread starts the session. We will start rewriting the Core.Libtorrent package now: this way we can learn more about how Tribler interacts with libtorrent and hopefully discover why Tribler is so slow.

vandenheuvel avatar Mar 06 '17 19:03 vandenheuvel

We did some profiling and found that under both the EPollReactor and the SelectReactor, close to 100% of the time was spent blocking in the poll of the reactors. This seemed normal as Tribler was basically idle during these tests (except for downloading using Libtorrent at ~2.5 MB/s).

vandenheuvel avatar Mar 06 '17 19:03 vandenheuvel

Yesterday, I spent some time testing Tribler vs libtorrent. I used Ubuntu 16.10 in virtualbox. For libtorrent only I got: result

Then, I changed the code in the master branch to work with Tribler and got this: before

Finally. I removed the flags keyword parameter from ltsession = lt.session(lt.fingerprint(*fingerprint), flags=1), and like magic: after

The difference seems to be that Tribler is not loading some of the default libtorrent features (like UPnP, NAT-PMP, LSD). Very strange...

egbertbouman avatar Mar 08 '17 17:03 egbertbouman

@egbertbouman Thanks for these hopeful results! Unfortunately, we have not been able to reproduce your results on our own system. However, we have noticed that using this flag in our basic script, the throughput of the program is negatively impacted to 2.5MB/s as well. We will look further into why the removal of this flag does not result in a change of throughput for us, although you have shown that this can be done.

MaxVanDeursen avatar Mar 08 '17 21:03 MaxVanDeursen

so uTP is enabled, but lt.fingerprint(*fingerprint), flags=1 what does that do?

synctext avatar Mar 08 '17 21:03 synctext

@synctext From the Libtorrent Manual:

If the fingerprint in the first overload is omited, the client will get a default fingerprint stating the version of libtorrent. The fingerprint is a short string that will be used in the peer-id to identify the client and the client's version ... The flags paramater can be used to start default features (upnp & nat-pmp) and default plugins (ut_metadata, ut_pex and smart_ban). The default is to start those things. If you do not want them to start, pass 0 as the flags parameter.

MaxVanDeursen avatar Mar 08 '17 21:03 MaxVanDeursen

After the results of last week we decided to do exhaustive tests on all combinations of our script and Tribler, as well with and without flags.

Seeder has different script than downloader in table below. Diagonal is equal code. All have 200ms latency.

Leecher \ Seeder Script NoFlag Tribler NoFlag Tribler Flag Script Flag
Script NoFlag H L H L
Tribler NoFlag L L L L
Tribler Flag L L L L
Script Flag L L L L

L(ow): convergence of download speed at the rate of ~2.5MB/s H(igh): convergence of download speed at the rate way above 2.5MB/s (i.e. >10MB/s)

From these results there is nothing in plain sight that we can conclude as far as the usage of this flag goes.

MaxVanDeursen avatar Mar 13 '17 19:03 MaxVanDeursen

image Please consider at some point to open up the magic box and dive into these detailed networking statistics. Something is off...

synctext avatar Mar 14 '17 09:03 synctext

Key problem: discover why Tribler has 2.5 MByte/sec performance with 200ms seeder-leecher latency, and clean Twisted-wrapped Libtorrent has 16MByte/sec.

The methodology is to strip the Python wrapping code and use a methodology of stumbling upon success. Need to be realistic and systematically do a brute-force exploration why it doesn't work. Finding the magic fix directly is not realistic at this point. Stripped Libtorrent manager now 40 lines of code, still 2.5 MByte/sec. Dispersy is off in Tribler.

Is it the database that blocks the whole Twisted thread for 25ms regularly?

synctext avatar Mar 22 '17 09:03 synctext

possible experiment to decide if the bottleneck is within the code or parameter settings or both:

  • Tribler stripped Libtorrent manager, which can only resume download
  • Libtorrent manager in clean Python, fast
  • Pauzed torrent, parameters created by Tribler, slow
  • Pauzed torrent, parameters created by Libtorrent clean code, fast

Mix and match both ways to see where the fault lies...

synctext avatar Apr 26 '17 12:04 synctext

I just read through (most) of this thread. Here are some observations:

  1. the libtorrent python bindings never call back into python from within libtorrent. Early versions attempted to do this, by supporting python extensions. This turned to cause subtle memory corruption or GIL deadlocks. So, libtorrent is not supposed to interact with python other than through the direct API calls. (so, if you find anything like that, please report a bug!)

  2. libtorrent has built-in instrumentation added specifically to troubleshoot performance issues. In versions before 1.1 it's a build switch (TORRENT_STATS), in 1.1 and later the stats are reported as an alert (which is disabled by default). If these alerts are printed to a log, it can be analyzed by tools/parse_session_stats.py in the libtorrent repo (that script exists pre 1.1 too, but loads the files produced by libtorrent. The gotcha is that the stats files are written to CWD, which needs to be writable by the process). The script requires gnuplot and will render a number of graphs and hook them up to an html page. Looking at them may reveal something.

  3. The meaning of the flags passed to the session constructor are defined here. By default both flags are set (add_default_plugins and start_default_features). By passing in 1, you just add the plugins, without starting DHT, UPnP, NAT-PMP and local peer discovery.

I also have some questions:

  1. Are you testing with TCP or uTP? The uTP implementation in libtorrent could be made to perform better on linux by using sendmmsg() and recvmmsg(), but currently it requires a system call per packet sent or received.

  2. which version of libtorrent are you using? If you're on 1.1.x, one way to have more control over how the session is set up is to use the constructor that takes a settings_pack. This lets you setup the configuration before starting the session.

  3. Is the main issue you're looking at the performance difference between your twisted wrapper and Tribler? Or are you looking at understanding the bottlenecks making the 200ms case have so much lower throughput than the 0 ms case? If the latter, would it be useful to you if I would whip up a one-to-one test transfer simulation with various latencies?

arvidn avatar May 02 '17 23:05 arvidn

@arvidn Thanks a lot for these points! You can have a look at our code here.

We're using the latest version in the Ubuntu repositories. Note that we're only trying to explain the difference in performance between Tribler and our script. Thus, calls like sendmmsg() and recvmmsg() don't seem that relevant as we're not doing these calls (explicitly) in our fast script.

Using settings_pack's might be helpful, but right now we're trying to replicate Tribler's settings in our script to achieve the same performance. @MaxVanDeursen maybe we should just refactor Tribler's code to use settings_pack's as a first libtorrent refactor step and hope that performance improves?

Yes, we're looking to explain this difference under the condition of 200 ms, as the difference becomes larger (from 4x to 8x using default settings) with respect to the 0 ms case. This latency (and above) is relevant for tunnel community performance, as we suspect this is the current bottleneck.

For our script (at 200 ms), we already know what the bottleneck is: buffer sizes (see above). Right now, we believe it is most probable that this has to do with settings passed to libtorrent either in creating the libtorrent session or adding a torrent.

vandenheuvel avatar May 03 '17 16:05 vandenheuvel

Report can be found at overleaf.

vandenheuvel avatar May 22 '17 16:05 vandenheuvel

do you have any (verbose) libtorrent logs from those test runs where the download did not start for a long time, or did not reach the max throughput for a long time? I would be interested in looking at them in that case.

also, if you're not using a sufficiently recent version of libtorrent, you may be experiencing this.

arvidn avatar May 22 '17 21:05 arvidn

@arvidn Sadly, we do not have any libtorrent logs whatsoever of this occurrence. However, if you would like to reproduce the error, you could look into our libtorrent latency benchmark before the pop_alert fix was implemented in order to get the logs.

afaik we used the last libtorrent version

MaxVanDeursen avatar May 24 '17 14:05 MaxVanDeursen

Golden experiment:

  • Select 1 torrent for this experiment
  • Create Pauzed torrent, parameters created by Tribler, slow
  • Create Pauzed torrent, parameters created by Libtorrent clean code, fast
  • Now take a pauzed Tribler torrent, and resume it within the potentially fast Libtorrent clean codebase.
  • Magic happens: is it a setting problem or not?
  • Other way around, will Tribler slow down pauzed fast torrents?

synctext avatar Jun 08 '17 14:06 synctext

Assign me!

ichorid avatar Jan 17 '18 16:01 ichorid

@ichorid Good luck! If you have any questions, please reach out to either @MaxVanDeursen or me.

vandenheuvel avatar Jan 17 '18 22:01 vandenheuvel

Mental note..one month spend on making unit tests faster would most likely be well spend..potential next optimization task.

synctext avatar Jan 24 '18 07:01 synctext

@synctext I assume you refer to our 'heavy' tests to test the correct working of anonymous downloading/hidden seeding (since most of our other unit tests are very fast)? I don't see any simple optimisations for these tests since the tunnel community relies heavily on Dispersy and the testing framework of Dispersy is a pain to work with and very slow. Instead, I would wait for IPV8 to be integrated in Tribler and use the IPV8 functionality to test the tunnel community. And create a stable integration test of anonymous downloading/hidden seeding. IIRC, @qstokkink already has a few tests for the tunnel community in IPV8 that run within milliseconds but don't pin me on that.

devos50 avatar Jan 24 '18 07:01 devos50

Thank you for that vital info. post-ipv8 then..So we can work towards 60Second full unit tests runtime...

synctext avatar Jan 24 '18 08:01 synctext

Correct: all of the tunnel unit tests (including the full end-2-end hidden seeding test) are completed within 2 seconds.

qstokkink avatar Jan 24 '18 10:01 qstokkink

This is not only a problem of performance, but bad code quality. The download settings and their defaults are not defined in a single place. The statement is that this bug could have been found in a day, with good code. Instead, we have technical depth.

Start of rewrite by Bram: no longer contains this silly LaunchManyCores things

Vadim status:

  • gladly takes the job of making beautiful code from 12 year old code which wraps Libtorrent
  • latest stage not in github yet
  • patches done to original container.sh etc matters
  • we have already a faster way to run these XLC container tests now.
  • XLC container template, one for leechers and seeders
  • feature to avoid long time to setup LXC, use snapshot; instead of setup everything from the sources (buggy, but fixed)
  • 200ms latency with clean Libtorrent reproduced

synctext avatar Jan 30 '18 16:01 synctext

Unstable download speeds with libtorrent:

When you run LXC-based latency tests for a single latency repeatedly, i.e. by setting latencies = [0 for x in range(numIntervals)] or by completely commenting out latency setting lines (tc ...), you don't get repeatable results. The first test run is always much faster than the subsequent ones, but sometimes you could randomly get high results in one of those too. result This means several things:

  1. Our previous measurements were wrong, as they were affected by this.
  2. There is a bug/unstable behaviour in either one or several of: 2.1 Ubuntu 17.10 networking; 2.2 LXC 2.1.0; 2.3 python3-libtorrent 1.0.7-1build1 (libtorrent-rasterbar8); 2.4 Our setup/handling of libtorrent session;
  3. There could be potential to greatly speed up downloads in tribler and other libtorrent-based clients.

As this could have serious consequences, I invite any interested parties to recheck my results, just in case ;-)

(Experiment conditions: host Ubuntu 17.10; LXC containers Ubuntu 16.04 (default))

ichorid avatar Feb 02 '18 11:02 ichorid

The same plot (libtorrent) for 200ms: result

ichorid avatar Feb 02 '18 11:02 ichorid

Does this behavior persist with the latest version of LibTorrent (1.1.7 I believe)?

qstokkink avatar Feb 02 '18 12:02 qstokkink

do these tests pin down whether TCP or uTP is used? One aspect I can imagine might affect this, is if the startup of the swarm is somewhat random, the first connection attempt (over uTP, which is preferred) fails sometimes and sometimes suceeds. With a failed uTP connection attempt, libtorrent will try TCP. Hence, unless the test explicitly disables one or the other, the protocol that ends up being used is probably somewhat random.

If I would be interested in exploring whether this variance is primarily caused by libtorrent, how hard would it be for me to reproduce these tests? is there a guide somewhere?

it's possible libtorrent-1.0.x is too old, but at least 1.1 and later has session stats counters that can be reported via an alert, enabled by the alert mask. Logging this for the peers could give a hint of what libtorrent is experiencing.

are these tests one-to-one transfers, or is it a single swarm of 10 peers? (if it's one-to-one, the logging should be manageable).

arvidn avatar Feb 02 '18 15:02 arvidn

Thnx Arvid for this again insightful comment. Its really a moral boost for the team that you're helping us out.

V: Please link your XLC scripts here for reproduction.

synctext avatar Feb 02 '18 15:02 synctext

@arvidn, the tests use default configuration of libtorrent, and the version is that available in the Ubuntu LXC container. Thanks to @vandenheuvel and guys it is very easy to reproduce the experiment. You could use Ubuntu VM, or try to run it in your environment. The whole test should be run as root. In Ubuntu you could do just something like: sudo su apt-get install lxc ctorrent python3-numpy python3-matplotlib cd /tmp git clone https://github.com/vandenheuvel/libtorrent-latency-benchmark cd libtorrent-latency-benchmark ./main.sh open ./result.png

And if it works, you could change the latency setting in leecher.py to latencies = [0 for x in range(numIntervals)] (the plot legend would still show 50ms steps, don't take that into account).

My next step would be to test the thing with the newest libtorrent version. To do that, I would first need to complete a small refactoring of these tests.

As I am not very familiar with libtorrent, could you please provide a "UDP-KISS" set of settings, that would turn off all adaptive features of libtorrent, like congestion control, auto-switching between UDP and TCP, etc.?

ichorid avatar Feb 02 '18 17:02 ichorid

I haven't gotten around to setting up an ubuntu VM to run this in. But I have some comments anyway:

  1. If netem (the tc command) works on the loopback device, your setup could be simplified to not require any containers. You could just set the latency on lo0 and make each instance bind to IPs 127.0.0.1 - 127.x.x.x, however many nodes you're running. The main benefit would be to shave off overhead and moving parts, to make sure you're not just measuring the performance (or variance) of the virtualization layer. If this approach fails, you could still make them all bind to 127.0.0.1, and make sure to enable the allow_multiple_connections_per_ip setting.

  2. In the loop that collects data during a transfer, you log download_rate from torrent_status. You would get better accuracy by measuring the total_download [1] instead. The download rate is fuzzy, it's basically the average download rate from the last few seconds. If you take a rate sample every second, they won't necessarily sum to the total downloaded bytes. Instead capture the downloaded bytes together with a timestamp.

  3. To disable uTP, set enable_outgoing_utp and enable_incoming_utp [2] to false.

  4. To disable DHT, Local peer discovery, UPnP, NAT-PMP, peer exchange, ut_metadata (magnet links) and smart_ban, you can set the flags argument to the session constructor to 0: session(flags=0) (at least I think that works in python).

  5. some settings that are relevant for a high bandwidth-delay-product test are the send_buffer_* [3] settings.

  6. If you suspect the cost of disk reads/writes start to dominate the test (which I would expect to be the case, as long as everything else works as it should). You may want to disable disk I/O. You can do that by setting disable_hash_checks [4] to true, and... hm. there's no way to set the add_torrent_params::storage to disabled_storage_constructor from python.. Anyway, with disk I/O disabled, you don't really need the underlying data for the torrent either, you can just pretend to have it by setting one torrent to seed_mode (in add_torrent_params)

arvidn avatar Feb 04 '18 22:02 arvidn

No need for containers even.. All sounds like solid suggestions to work on. @egbertbouman Can you also help out if needed today.

synctext avatar Feb 05 '18 07:02 synctext

@arvidn, thanks for suggesstions! Will now try to play with the settings and report back as soon as something interesting turns up.

ichorid avatar Feb 05 '18 08:02 ichorid

Starting libtorrent session with session(flaqs = 0) enable_incoming_tcp = 0 enable_outgoing_tcp = 0 at 0ms latency produces relatively stable results that max out at 80 Mb/s: flag0_0ms

When running without flags = 0, like session() enable_incoming_tcp = 0 enable_outgoing_tcp = 0 the first download is 2 times faster than the remaining ones, maxing out at 150 Mb/s, and there is a pattern of "fast" downloads with slowly capping top speeds: noflag0_0ms

For 100ms with flags=0 this effect produces semi-stable 5Mb/s: flag0_100ms

But without flags=0 it nets a first download with immediate 30 Mb/s, and then for all remaining downloads it first drops to semi-stable 5 Mb/s, and after around 60 seconds of this magically ramps up 30 Mb/s: noflag0_100ms (The settings were set equal on both leecher and seeder.)

Even with flags=0 enabled, after the first download, the sessions sometimes become "stuck" at random for a brief moment. Apparently, this is the effect that produces "waves" in the plot.

To me, it seems like something related to congestion control, or memory management. Previous experiments with Tribler-based downloads showed a similiar pattern, but without ramping up to full speed. I suspect that when Tribler meets latency, some congestion-handling feature in libtorrent just doesn't fire up.

@arvidn, what kind of feature in libtorrent could cause such performance spike/waves? (I am using python3-libtorrent 1.0.7-1build1 (libtorrent-rasterbar8) from Ubuntu 17.10)

ichorid avatar Feb 05 '18 18:02 ichorid

the stall in the last run looks like they got stuck checking files or something. and if the first connection attempt fails, there may be a delay before trying again.

I would like to stress again that download_rate is a lie. The shapes of your graphs suggests that that's what your plotting. I would highly recommend measuring the time and total_download instead. instead of sleeping for one second and then assume it's been a second, check the time again and print it to your csv file for the plot. when you run benchmarks where the computer is very busy, your delays are increasingly likely to be longer than you ask for.

The best way of knowing for sure is to enable and log the session_stats counters (there's a tool parse_session_stats.py to generate plots from it too)

arvidn avatar Feb 05 '18 18:02 arvidn

Here are the plots of download progress produced with the method suggested by @arvidn (total_download with timestamps). The charts are for the same experiments as in the previous series: 0 ms session(flaqs = 0) flag0_0ms

0 ms session() noflag_0ms

100 ms session(flags = 0) flag0_100ms

100 ms session() noflag_100ms

The slope of the lines corresponds to download speed. We can still see the basic problem pattern from the previous plots. In addition, the small freezes become apparent. Please mind the automatic scaling done by the pyplot (i.e. the horizontal and vertical scale are different for each plot).

ichorid avatar Feb 06 '18 14:02 ichorid

Facinating. To efficiently zoom into a solution for Tribler issues, please do experiments with 200ms latency and latest binary released.deb package of Libtorrent. Having 80MByte/sec would be awesome. We have 2.5 MByte/sec merely.

synctext avatar Feb 06 '18 17:02 synctext

Clean Twisted-wrapped Libtorrent has 16 MByte/sec. You are very likely seeing another performance bottleneck then Tribler.

synctext avatar Feb 06 '18 17:02 synctext

@arvidn, now I'm working on adding proper Tribler-based tests to this experiment. If Tribler-based results at 200ms would match the libtorrent with flags=0, that would mean some magic extension in libtorrent that allows it to show good performance under big latencies does not fire up within Tribler. This fact would let me narrow it down to a single flag or combination of them in libtorrent-based test alone, to trace it in Tribler later. In addition, that would mean something in core libtorrent and/or python bindings produces an unexpected and unnecessary long delay before the lib could again reach the top performance. (I'm going to do the tests with latest stable libtorrent built from source (1.6))

ichorid avatar Feb 08 '18 09:02 ichorid

great, 1.1.6 is good. A more direct approach to understanding a throughput bottleneck would be to post session stats counters regularly and log them to a file. That's basically what they're for. experimenting with various flags is also a good idea, especially to discover unexpected side-effects. But I would primarily imagine you're seeing second or third order effects of the flags.

For example, tracker announces may be held off until UPnP or NAT-PMP has succeeded, in order to get an accurate external port to announce, if neither of those work, there may be some unnecessary delays on startup. I don't think this specific issue affect you, since you don't rely on a tracker, but there may be other effects of that nature you're seeing the effects of.

arvidn avatar Feb 08 '18 10:02 arvidn

@arvidn, basically, how does python bindings for libtorrent work? What does and what does not kept between libtorrent sessions? And how could I be sure that the underlying objects are really deleted, sockets closed, etc.? For example, when I create a libtorrent session object in Python:

import libtorrent as lt
def mkses():
    ses = lt.session()
    sleep (5)
    print ses
for i in range(3):
    mkses()

what state these three session objects would share/inherit from predecessors?

ichorid avatar Feb 08 '18 18:02 ichorid

each session object you construct is independent. There's no invisible state they share (other than the usual, filesystem, network interfaces etc.)

They will all open listen sockets independently, two will fail to bind because the third got the port first, they may bump the port and try again and so on (that's configurable and enabled by default).

about when they are destructed, I believe that's up to the python GC. Aren't objects reference counted in python though? so you should get consistent and reliable destruction. otherwise you may be able to use del.

did I understand you question?

arvidn avatar Feb 09 '18 20:02 arvidn

I also found the strange pauses in throughput today when analyzing the tunnel integration tests with @devos50. These are the outputs of three seperate runs sending a 2.06 MB file, plotting all the libtorrent data going through the tunnels:

plot1 plot2 plot3

In the last run there was no throughput pause at all. Hope this helps @ichorid 👍

qstokkink avatar Feb 11 '18 17:02 qstokkink

@arvidn, yes, your answer was exactly about the things I was interested in. Thank you 👍 .

Now, the preliminary results from testing Tribler-based download. The leecher used a single session of Tribler, and the seeder was libtorrent-based. Distro: Ubuntu 17.10 (lxc) libtorrent: 1.1.6 (stable source tarball from official site) tribler: 7.0 (stable source tarball from official site)

Tribler config lines:

config = SessionStartupConfig()
config.set_state_dir(os.path.join(os.getcwd(), '.tribler'))
config.set_torrent_checking(False)
config.set_multicast_local_peer_discovery(False)
config.set_megacache(False)
config.set_dispersy(False)
config.set_mainline_dht(False)
config.set_torrent_store(False)
config.set_enable_torrent_search(False)
config.set_enable_channel_search(False)
config.set_torrent_collecting(False)
config.set_libtorrent(True)
config.set_dht_torrent_collecting(False)
config.set_videoserver_enabled(False)
config.set_enable_metadata(False)
config.set_http_api_enabled(False)
config.set_tunnel_community_enabled(False)
config.set_creditmining_enable(False)
config.set_enable_multichain(False)

seeder config (libtorrent-based, flags=0):

ses = lt.session(flags=0)
ses.enable_incoming_tcp=0
ses.enable_outgoing_tcp=0

0 ms 0ms_tribler_0 1

100 ms 100ms

We see roughly the same performance as from the pure libtorrent with flags=0. Therefore, we could assume the flags to be the main suspect. Another important thing to note is that Tribler session sometimes introduces an additional delay of 60 seconds. This looks similar to the pattern of libtorrent getting up to speed after 60 seconds that we've seen in previous tests for 100ms with default flags. This gives us a hint that some subsystem in libtorrent just does not start properly in Tribler and is sub-optimally initialized in pure libtorrent-based tests. Next, I will check the "seeder vs leecher" problem. After that I will commit the tests framework to Github and continue the investigation by scoring the effects of individual flags.

ichorid avatar Feb 12 '18 17:02 ichorid

@ichorid LibtorrentDownloadImpl will wait with downloading a torrent until the session has found more then 25 DHT nodes (see https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentDownloadImpl.py#L260 and https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentMgr.py#L567). This was introduced as a fix for a previous libtorrent version, but perhaps it's not needed anymore. Anyway, this could be the cause of additional delays when dealing with larger latencies.

egbertbouman avatar Feb 12 '18 20:02 egbertbouman

When downloading with Tribler at 100ms, changing flags=0 at seeder side does not change the results of the experiment. The delay is still there, speed is still at flags=0 libtorrent baseline.

ichorid avatar Feb 13 '18 10:02 ichorid

Comparison of logs from delayed iterations of tests with undelayed ones showed that delayed ones always produce the following lines:

DEBUG:LibtorrentMgr:LibtorrentMgr: could not find torrent 25283068c8e1470dd82db03994e9b4e7f59087c3
DEBUG:LibtorrentMgr:LibtorrentMgr: could not find torrent 25283068c8e1470dd82db03994e9b4e7f59087c3

Adding sleep (10) between iterations to avoid race condition in libtorrent on session.remove_download(..) resulted in logs without ... could not find torrent ... and plots void of random 60s delay: 100ms_sleep

Apparently, there is a race condition in tribler and/or libtorrent on session.remove_download(). Further checks with pure libtorrent will follow soon.

ichorid avatar Feb 13 '18 12:02 ichorid

remove_torrent() is an async. Call. It’s not obvious though how add_torrent() could jump ahead of it

arvidn avatar Feb 13 '18 15:02 arvidn

Oh, maybe it’s the other peer trying to connect before it’s been added. Sync. With add_torrent_alert.

arvidn avatar Feb 13 '18 15:02 arvidn

Forked libtorrent-latency-benchmark and added Tribler support, ability to switch between Tribler/libtorrent modes, remove_torrent workaround, compiling libtorrent from source, etc. libtorrent-latency-benchmark:tribler

ichorid avatar Feb 14 '18 13:02 ichorid

@arvidn, indeed, the libtorrent version actually matters for this problem. The following plot is for libtorrent 1.0.7 as found in Ubuntu 17.10: result

And this one is for libtorrent 1.1.6 compiled from source (the same Ubuntu 17.10): src100 (The flags option was left to default, the session was restarted for each iteration, and sleep(10) was put between iterations, just in case.)

This means that the latest libtorrent does not get up to high-speed mode except for the first run.

ichorid avatar Feb 14 '18 14:02 ichorid

Setting flags=0 on either side of connection results in "slow mode" download. Now I'm going to investigate the flags effects on vanilla libtorrent benchmark.

ichorid avatar Feb 14 '18 14:02 ichorid

It's LSD extension. If it is enabled, libtorrent gets to high speeds on high-latency connections. I have no idea how it could relate to latencies, but 1 minute interval before it kicks in is consistent with LSD specifications. I'll do more realistic experiments to confirm this.

ichorid avatar Feb 14 '18 16:02 ichorid

OK, I think I got it. The problem is with uTP protocol. When LSD announces, libtorrent switches to TCP (@arvidn, is it really like it should be?) instead of uTP, and becomes much faster. This is because TCP does not care about latencies. And synthetic latencies introduced by netem does not play well with uTP congestion algorithms. Basically, uTP underutilizes the link. So, in short: in synthetic test environment uTP=slow, TCP=fast. And TCP-based transport kicks in as a result of LSD announcement (bug?). enable_incoming/outcoming_udp/tcp flags do not seem to have any effect on this behaviour. Whether TCP would provide better speeds in real life high-latency environment remains to be investigated.

ichorid avatar Feb 14 '18 17:02 ichorid

@arvidn, how can I control uTP vs TCP usage for individual connections and/or sessions? Is it possible to force the usage of either protocol?

ichorid avatar Feb 14 '18 17:02 ichorid

@ichorid nice find! Could you do something with these settings? https://github.com/Tribler/tribler/blob/next/Tribler/Core/Libtorrent/LibtorrentMgr.py#L139

devos50 avatar Feb 14 '18 17:02 devos50

@arvidn, I suspect this strange behaviour we observe in our tests is the result of three distinct bugs in libtorrent (and/or Python bindings):

  1. uTP has much worse performance than TCP on high-latency, high-bandwidth links.
  2. If leecher gets LSD announcement from seeder, it would start the download using TCP. And even if it already is downloading this torrent via uTP, it would switch to TCP.
  3. enable_incoming/outcoming_udp/tcp flags doesn't work.

I would try to re-check the results, to be sure.

ichorid avatar Feb 15 '18 11:02 ichorid

If netem (the tc command) works on the loopback device, your setup could be simplified to not require any containers. You could just set the latency on lo0 and make each instance bind to IPs 127.0.0.1 - 127.x.x.x, however many nodes you're running.

Please replace the containers with loopback idea and boost the send_buffer settings. Try a Tribler-to-Tribler test to see if there is low-hanging fruit to boost our 4 MByte/sec sad performance. Try minimal example where uTP fails to perform OK currently.

With 50MByte/sec and 100ms one-way latency we always have 10 MByte of un-acknowledged data in-flight. Buffers need to be huge. Look at various buffer settings send_socket_buffer_size. Any tips from Sweden?

synctext avatar Feb 15 '18 13:02 synctext

My bad, I was not setting the settings the right way (( I'll redo the enable_incoming/outcoming_udp/tcp in a pinch.

ichorid avatar Feb 15 '18 15:02 ichorid

So yeah, enable_incoming/outcoming_udp/tcp flags work. I was wrong on that one. Please, accept my apologies, @arvidn.

ichorid avatar Feb 15 '18 15:02 ichorid

I reproduced the uTP vs TCP performance problem using libtorrent's examples/client_test.

TCP uTP
0 ms 310 Mb/s 104 Mb/s
100 ms 30 Mb/s 5.2 Mb/s

The tests were run on loopback device, 100ms round-trip latency was added with netem:

tc qdisc add dev lo root netem delay 50ms 

To force TCP/uTP on both ends I used -J|-y options.

# in the first terminal
./client_test -b 127.0.0.1 ./test.torrent [-J|-y] 

# in the second terminal
./client_test -r 127.0.0.1:6881 [-J|-y]

ichorid avatar Feb 16 '18 14:02 ichorid

I have a suspicion that the first result is CPU bound. UDP is significantly less efficient than TCP at high rates, since (traditionally [1]) every single packet needs a system call to be sent and a system call to be received. Since a packet is ~1400 bytes, that ends up being a lot of syscalls, as you can imagine. This can be mitigated somewhat by detecting that the MTU over loopback is actually 64 kiB and send large packets. I had some code to do that for a while, but as I recall, it was a bit flaky. I removed it thinking transferring over loopback isn't an important use case anyway.

I think the second result, with 100ms delay, is much more interesting from the point of view of looking at the effect of real-world transfers over high latency networks. I suspect that in that case, the rate is limited by the send- and receive- buffers configured for libtorrent. There's a setting called send_buffer_watermark which defaults to 500 kiB. Try setting that higher, to 1 or 2 MiB. You may also want to increase send_socket_buffer_size and recv_socket_buffer_size. They cas the congestion window and advertised receive window in uTP (iirc). They probably need to be at least 1 MiB too.

[1] there is sendmmsg() and recvmmsg() on linux nowadays, and I've been meaning to add support for that to the uTP implementation, but I haven't gotten to it yet (patches are welcome! :) )

arvidn avatar Feb 19 '18 12:02 arvidn

Increasing send_buffer_watermark, send_socket_buffer_size, and recv_socket_buffer_size to 2MiB does not seem to have any effect (or I am doing it wrong). However I added set_utp_stream_logging(1) to client_test.cpp, and #define TORRENT_UTP_LOG_ENABLE to utp_stream.cpp/hpp. Next I ran the loopback benchmark again, with 100ms round-trip latency. For both seeder and leecher, I parsed the resulting utp.log with parse_utp_log.py, as @arvidn suggested earlier. I hope the produced plots could shed some light on this mystery:

Seeder: utp out0x7f7bc8003d20-uploading utp out0x7f7bc8003d20-uploading_packets utp out0x7f7bc8003d20 packet_sizes utp out0x7f7bc8003d20-slow-start

ichorid avatar Feb 19 '18 17:02 ichorid

Leecher: utp out0x7f5ee4008000-cwnd utp out0x7f5ee4008000-their_delay_base

ichorid avatar Feb 19 '18 17:02 ichorid