transmission icon indicating copy to clipboard operation
transmission copied to clipboard

fix: allow unreachable peers to be retried

Open tearfur opened this issue 1 year ago • 4 comments

Fixes #6971.

tr_peer_info::reconnect_interval_has_passed() contains logic to increase the reconnect interval for unreachable peers, but it has no effect at all currently because they wouldn't be tried in the first place.

tearfur avatar Jul 05 '24 02:07 tearfur

Another thought, related to my comment in code that the 15min first-retry-interval-after-failure is too long: should instead the reconnect logic also weight number of peers? for private trackers, this is fairly important. the standard pattern on private trackers is to limit download slots by queueing. A lot of seeders will have a small active queue and we should try more aggressively to get that peer's attention if they are one of the only seeds, and especially if singular.

so, eg, in get_reconnect_interval_secs, do not apply the (unreachable) penalty if #peers in swarm <= 3 ? it will still retry with a backoff cascade, but less conservatively than it is now.

reardonia avatar Jul 10 '24 00:07 reardonia

Resolved merge conflicts.

tearfur avatar Sep 09 '24 07:09 tearfur

Having played with this more, still concerned with how it works in a typical scenario: start/pause/restart.

Simple repro:

  1. add new torrent with peers < per-torrent limit (eg 20).
  2. start torrent, let all available peers connect and start transferring data. We now know these peers are good/connectable.
  3. pause torrent
  4. restart torrent 10 seconds later.

Now the torrent is "stuck" ; all the peers that were connected previously are in a reconnect timeout period. It doesn't reset quickly. Mostly those peers will reconnect inbound, but Transmission will not reinitiate outbound because the peers were marked as bad when we paused torrent?

reardonia avatar Oct 25 '24 14:10 reardonia

I haven't found a suitable torrent to test this with. For now I'm guessing it is an unintended side effect of 9ff95d162e51d0dbacff242e7f6c7483847d61aa. Doesn't look related to this PR.

https://github.com/transmission/transmission/blob/361b5c6152bf8d90d7dbc6937df133d458dda6b4/libtransmission/peer-mgr.h#L247-L258

tearfur avatar Nov 01 '24 03:11 tearfur

I haven't found a suitable torrent to test this with. For now I'm guessing it is an unintended side effect of [9ff95d1]

Is there much downside to doing connectable_pool.clear() during tr_swarm::on_torrent_stopped ? Would it void saving peers to resume?

reardonia avatar Nov 03 '24 22:11 reardonia

Would it void saving peers to resume?

Yes it will. That sounds terrible especially for torrents with little seed.

tearfur avatar Nov 06 '24 09:11 tearfur

Would it void saving peers to resume?

Yes it will. That sounds terrible especially for torrents with little seed.

I don't understand. It's actually a chance at an improvement; at least we will then have a change to reacquire those peers from tracker. The current situation is actually worse: good peers are disabled on stop/start. We retain the peers in the pool but they are in a long reconnect timeout. The real problem is that we are putting those good, connected peers into a timeout, that shouldn't happen.

reardonia avatar Nov 06 '24 12:11 reardonia

The real problem is that we are putting those good, connected peers into a timeout, that shouldn't happen.

Yes, agreed that this needs to be fixed. I just don't like introducing a new problem in place of another problem.

The solution I have in mind is something like this:

if (is_connected_)
{ 
    piece_data_at_ = {};
}
// Reset the failure count if the peer disconnected because the torrent stopped
else if (has_transferred_piece_data() || torrent_stopped)
{
    num_consecutive_fruitless_ = {};
}
else
{
    on_fruitless_connection();
}

It's ideal in the sense that there wouldn't be any side effects, but there are a lot of hoops to jump through just to pass the torrent_stopped value into the method, and I haven't got an idea for how to do that elegantly. I'm still brainstorming for alternatives.

tearfur avatar Nov 07 '24 03:11 tearfur

I saw those months of discussion on this topic, and I'd like to clarify that I'm maintaining my approval for this change. Other improvements can be done subsequently instead of trying to pack them all in the same pull request.

Coeur avatar Nov 08 '24 05:11 Coeur