libtorrent icon indicating copy to clipboard operation
libtorrent copied to clipboard

Inefficient synchronization way to obtain some data from libtorrent

Open glassez opened this issue 1 year ago • 15 comments

libtorrent version (or branch): both v1.2.x and v2.0.x

platform/architecture: any

compiler and compiler version: any

When receiving some data from libtorrent using synchronous calls (for example, using some kind of torrent_handle getter), too long delays are noticed from time to time, regardless of either the current system load or the parameters of the affected torrent. It looks like either the synchronization code has some drawbacks, or it is implemented in a way that is unfriendly to the caller. Recently, we have tried to avoid synchronous calls as much as possible. Unfortunately, some of the mentioned data cannot be obtained asynchronously (for example, via torrent_status), so we still have to resort to synchronous way, which leads to GUI hangups.

glassez avatar Aug 08 '22 11:08 glassez

I tried to figure it out myself, but I'm not that familiar with the libtorrent codebase. Besides, I've never used Boost.Asio to understand it enough. I have done some educational work to understand its basics, but my judgments below may still be erroneous. Here's what I realized:

When some better of torrent_handle is called, it obtains a pointer to handled torrent instance and calls the corresponding method of it using async_call_ret: https://github.com/arvidn/libtorrent/blob/40fc5ce16a522dbc67d383c239a7e960d6521c39/src/torrent_handle.cpp#L172 But, IIRC, it isn't called directly and just posted in some queue and then It'll invoked in libtorrent session thread: https://github.com/arvidn/libtorrent/blob/40fc5ce16a522dbc67d383c239a7e960d6521c39/src/torrent_handle.cpp#L187 So, if I'm not mistaken, it will be executed only after all the previous callbacks from the queue have been executed. At the same time, the calling thread is blocked until the callback is completed, and if the libtorrent is currently busy with some heavy work, we get the delay mentioned above. If it works as I described above, then this is clearly an inefficient and unfriendly way from the caller perspective. After all, it turns out that caller is forced to wait until all the current work is done, even if it doesn't affect the resources requested by caller. Wouldn't it be more efficient to do this in a classic way, executing all the code in the caller thread and locking only the requested data using some kind of mutexes?

glassez avatar Aug 09 '22 17:08 glassez

Wouldn't it be more efficient to do this in a classic way, executing all the code in the caller thread and locking only the requested data using some kind of mutexes?

I would guess, the work queue would be easier to implement/work correctly than to put mutex on the related data and debug when it dead locks...

Chocobo1 avatar Aug 10 '22 04:08 Chocobo1

I would guess, the work queue would be easier to implement/work correctly than to put mutex on the related data and debug when it dead locks...

Sure. I understand perfectly well that such an architecture makes it much easier to maintain the libtorrent itself. And unfortunately that is why it is unlikely to be considered for change. But maybe by drawing attention to this problem, we can find some alternative way to solve it.

glassez avatar Aug 10 '22 12:08 glassez

@glassez what happens if you always make libtorrent's synchronous calls from some worker threads, rather than the UI thread?

@glassez what happens if you always make libtorrent's synchronous calls from some worker threads, rather than the UI thread?

I was thinking of this option as a last resort, unless a more convenient way is found. It still looks like a workaround.

glassez avatar Aug 11 '22 10:08 glassez

which information or data can only be retrieved via a synchronous call?

fine-grained locking isn't necessarily more efficient. It means the main thread would need to spend a lot of time locking and unlocking mutexes protecting the shared memory. Not sharing memory between threads is normally more efficient. Although in this case it's unlikely that cache lines would ping-pong between threads.

arvidn avatar Aug 21 '22 15:08 arvidn

It means the main thread would need to spend a lot of time locking and unlocking mutexes

But you do still lock the mutex: https://github.com/arvidn/libtorrent/blob/40fc5ce16a522dbc67d383c239a7e960d6521c39/src/torrent_handle.cpp#L198

glassez avatar Aug 21 '22 17:08 glassez

the shared state for that mutex is exactly one bool. That bool is accessed very rarely (just when the call completes, to signal back to the calling thread).

That bool is part of the thread communication that transfers ownership of objects between threads. In this case, from the main thread to the calling thread. It's different from having shared state between threads, and have everyone lock a mutex every time it's accessed.

If it's not done with fine-grained locking (where the state is split up into smaller chunks with individual mutexes), you basically have the same situation as you do today. When the calling thread locks the mutex (for the whole state) it has to wait for all other work to complete before it gets the lock. Typically mutexes have a queue of threads waiting for it as well, to prevent starvation.

arvidn avatar Aug 21 '22 19:08 arvidn

When the calling thread locks the mutex (for the whole state) it has to wait for all other work to complete before it gets the lock. Typically mutexes have a queue of threads waiting for it as well, to prevent starvation.

It has to wait for only the work that touch the needed data, isn't it? Correct algorithm should keep some resource locked as little time as possible.

you basically have the same situation as you do today

Currently it has to wait for all other work to complete even if that work has nothing to do with requested data. E.g. if some trifling getter like session::is_listening() is called when libtorrent parses resume data for some big v2 torrent (or several torrents) it has to wait until parsing is done. Or am I wrong?

glassez avatar Aug 24 '22 15:08 glassez

In fact, this is a real problem for interactive applications. Recently we had to add in qBittorrent a lot of caching to avoid such blocking calls. We do this for almost all data coming from the user/application side (which have no entry in torrent_status). But there are still some data dynamically produced on the libtorrent side that are not available through torrent_status, but only through a blocking call (e.g. partial_piece_info, file_progress, piece_availability). To avoid blocking the UI, we can either request to add this data to torrent_status, or fetch it in a separate thread. But there is another snag. Most of this data is only used to present to the user for the currently selected in UI torrent, and it is more or less expensive to produce, so keeping it constantly updated looks wasteful, doesn't it?

Any ideas?

glassez avatar Sep 10 '22 12:09 glassez

I believe the simplest approach would be something like:

struct torrent_handle {
   void post_file_progress();
   void post_piece_availability();
   void post_download_queue();
   // ...
};

struct file_progress_alert : torrent_alert {
   span<std::int64_t> files() const;
};

struct piece_availability_alert : torrent_alert {
   span<std::int64_t> availability() const;
};

struct download_queue_alert : torrent_alert {
   span<partial_piece_info> partial_pieces() const;
};

It could be made even more efficient by providing subscriptions with delta updates. The first alert when subscribing would be the complete state, then followed by updates to individual pieces/files. This would be a bit more complicated to implement and use, but it would minimize the bandwidth between the threads.

constexpr subscribe_flags_t file_progress = 0_bit;
constexpr subscribe_flags_t piece_availability = 1_bit;
constexpr subscribe_flags_t partial_pieces = 2_bit;

struct torrent_handle {
   void subscribe(subscribe_flags_t flags);
   void cancel_subscriptions();
   // ...
};

struct file_progress_delta_alert : torrent_alert {
   file_index_t file;
   std::int64_t downloaded;
};

struct piece_availability_delta_alert : torrent_alert {
   piece_index_t piece;
   std::int64_t availability;
};

struct partial_piece_delta_alert : torrent_alert {
   partial_piece_info info;
};

arvidn avatar Sep 10 '22 15:09 arvidn

it is more or less expensive to produce, so keeping it constantly updated looks wasteful

Or maybe not all of them so expensive?.. E.g. file progress (using piece granularity) could be provided via torrent_status.

glassez avatar Sep 11 '22 08:09 glassez

Yet another problem is "web seeds". At a first glance it is something that is set from the user side (so we could just cache it and update once it is changed by user). But unfortunately it is being modified by libtorrent for some reasons. Is it really intended? Is there some "web seeds exchange" feature or something similar so web seeds are updated without user involvement?

glassez avatar Sep 11 '22 08:09 glassez

But unfortunately it is being modified by libtorrent for some reasons. Is it really intended? Is there some "web seeds exchange" feature or something similar so web seeds are updated without user involvement?

what happens is that if the web server responds with a redirection, permanently moved, the web seed URL is updated in the torrent state. Redirects happen at the file-level, and libtorrent supports web seeds redirecting every file to different URLs. When a file is redirected, the new URL is added as a web seed and marked as having just that one file. If another redirect happens to the same server, I believe its "have" mask is updated to include the new file as well.

Redirects that are not permanent flags the new web seeds as ephemeral, which means they won't be saved in the resume data. So the redirects will be followed again in the next session.

arvidn avatar Sep 11 '22 09:09 arvidn

I believe the simplest approach would be something like:

struct torrent_handle {
   void post_file_progress();
   void post_piece_availability();
   void post_download_queue();
   // ...
};

struct file_progress_alert : torrent_alert {
   span<std::int64_t> files() const;
};

struct piece_availability_alert : torrent_alert {
   span<std::int64_t> availability() const;
};

struct download_queue_alert : torrent_alert {
   span<partial_piece_info> partial_pieces() const;
};

Agree. It is a simplest approach. Moreover it is rather universal approach. It allows the application to either fetch data on demand or keep it constantly updated by simply requesting it from libtorrent on a regular basis. But I don't like the scalability of the proposed variant. It generates too many alerts (note that in general, it can be called regularly for many torrents). I would modify it as follows:

  1. Provide a structure with optional fields for all such "dynamic" data
  2. Provide single method session::post_dynamic_data()
  3. Provide method to subscribe particular torrent to different kind of dynamic data (something like in your second proposal)
  4. Provide single alert as reaction to session::post_dynamic_data() that will deliver the data for all subscribed torrent at once (like state_update_alert does).

glassez avatar Sep 11 '22 15:09 glassez

@arvidn Well, do you have any plans to provide an asynchronous way to get such kind of data? Seems it could be added even in RC_1_2/RC_2_0.

glassez avatar Sep 18 '22 09:09 glassez

@arvidn Well, do you have any plans to provide an asynchronous way to get such kind of data? Seems it could be added even in RC_1_2/RC_2_0.

Still, realizing that it is unlikely to be added quickly, I try to solve the problem in a workaround as usual. The other day I tried to play around with creating lt::session using the io_context provided by the application. But unfortunately I came across a strange problem that I couldn't figure out. I will describe it here, assuming that I am stuck simply because of lack of experience using Boost.Asio, so someone more experienced will tell me its solution. The essence of the problem is that when I use an external io_context, asio::dispatch() (used inside torrent_handle methods) behaves incorrectly, namely, it always puts a task in a queue, even if it is called inside io_context::run(), which leads to dead locking when calling some methods of torrent_handle within the libtorrent thread (e.g. inside plugin methods).

glassez avatar Oct 27 '22 13:10 glassez

yeah, I thought there was a warning in the documentation about that. The intention of the overload that takes an io_context is to create the session without an internal thread

arvidn avatar Oct 27 '22 15:10 arvidn

After a series of experiments, I managed to narrow down the search for the problem (however, the reason is still unclear to me). If you slightly change the session::start() code so that it calls the thread creation code (with io_context::run() in it), then there are no problems. If this code is outside the session constructor, but in the application code after calling the constructor, then I have the above problem. I don't know what the reason is. Maybe it's still some kind of bug in Boost.Asio?

glassez avatar Oct 27 '22 15:10 glassez

It seems that the easiest way for libtorrent to provide the ability for applications to obtain the above data asynchronously would be to have a public analogue of session_handle::async_call().

glassez avatar Oct 27 '22 16:10 glassez

It seems that the easiest way for libtorrent to provide the ability for applications to obtain the above data asynchronously would be to have a public analogue of session_handle::async_call().

But it is practically no better than any other generic way, and besides, it requires changes to libtorrent, unlike other ways that can be implemented purely at application side. So forget it.

glassez avatar Oct 28 '22 16:10 glassez