libtorrent icon indicating copy to clipboard operation
libtorrent copied to clipboard

Reconnect breaks connectivity since 1.2.4, 1.2.5

Open xnoreq opened this issue 4 years ago • 104 comments

Since #4325 was closed but this issue was not resolved I'm opening a new issue for this.

Setup:

  • I'm using qbittorrent
  • bound to any interface/IP address
  • my "eth0" interface has very limited connectivity to specific networks, no Internet access
  • vpn connected through tun0 with Internet access

How to reproduce:

  • reconnect vpn while qbittorrent is running

Outcome:

  • trackers stop working
  • DHT nodes drop to 0

Expected outcome:

  • trackers reconnect after vpn reconnection
  • DHT nodes don't drop

It looks like on vpn disconnect libtorrent disables and removes everything (such as DHT nodes) associated with the vpn0 IP address bound socket instead of leaving these things in place to time out or get picked up again if a reconnection happens in time.

xnoreq avatar Mar 10 '20 20:03 xnoreq

Please provide more information on exactly how trackers stop working. What’s the error message in the tracker error alert? Or do trackers stop attempting to announce?

A verbose log with torrent log notifications enabled would also probably be very helpful.

After you stop and start your VPN, how long do you wait for peers, trackers and DHT nodes to reconnect?

The thing with the DHT is that the routing table depends on your node id, which depends on your external IP. If your IP changes, the routing table needs to be changed. The nodes aren’t dropped, but most of them may no longer have a place in your new routing table. (And if they can’t be contacted, they will likely be dropped).

arvidn avatar Mar 11 '20 08:03 arvidn

I cannot reproduce this. I use private internet access on linux and I disconnect it and reconnect it. Using client_test and a bunch of torrents.

I can confirm that the DHT routing table is cleared out immediately when changing out the network interfaces, but it starts to build up again within a few seconds (say, 5-10 seconds).

What kind of VPN do you have? OpenVPN or wireguard?

arvidn avatar Mar 11 '20 11:03 arvidn

I've tested with both private internet access (OpenVPN) and nordnet (OpenVPN and wireguard) on ubuntu 19.10, without encountering this problem.

Which operating system are you on?

arvidn avatar Mar 11 '20 12:03 arvidn

simple_client binds to 0.0.0.0:6881 just like qbittorrent did with libtorrent before 1.2.4 so it probably doesn't care if the interfaces goes down or the IP address is removed from it.

What's curious is that after reconnecting the VPN and keeping qbittorrent running but removing and re-adding the torrent it started downloading again but it seems to have rediscovered some peers and starting downloading again: grafik

DHT nodes also stay 0.

grafik

Restarting qbittorrent results in: grafik

grafik

xnoreq avatar Mar 11 '20 18:03 xnoreq

@xnoreq

Which operating system are you on?

you may have missed this...

xavier2k6 avatar Mar 11 '20 18:03 xavier2k6

The vpn uses openvpn with a script that executes on up:

ip link set dev "tun0" up mtu "$tun_mtu"
# setting up network schedulers
# setting up firewall rules
ip addr add dev "tun0" "${ip_local}/${netmask}"

And executes this on route-up:

ip route add 0.0.0.0/1 via "$gateway" dev "tun0"
ip route add 128.0.0.0/1 via "$gateway" dev "tun0"

I'm on Arch Linux, kernel 5.5.8, OpenVPN 2.4.8.

xnoreq avatar Mar 11 '20 18:03 xnoreq

I'm experiencing something like this potentionally. Opensuse Tumbleweed, libtorrent 1.2.5 and deluge, openvpn tun0 in a linux namespace for isolation. No connectivity, lots of failures to talk to trackers showing in the logs, no activity other than DNS. Downgrading to 1.2.3 fixed it for a time, but even with locking the package, when tumbleweed moved to python 3.8, it broke again for other reasons. Hoping the package maintainers for opensuse get 1.2.5 soonish :)

mgaulton avatar Mar 16 '20 18:03 mgaulton

@mgaulton would you mind posting the errors you see in the log?

arvidn avatar Mar 19 '20 09:03 arvidn

I can confirm this issue exists in libtorrent 1.2.5 and it doesn’t exist in 1.2.3. Even without a VPN, when you lose connection, after reconnection the trackers stop working. Unfortunately qBt doesn’t log anything.

ghost avatar Mar 25 '20 09:03 ghost

Even without a VPN, when you lose connection

By "losing connection", you mean unplugging and plugging back in the ethernet cable? I imagine you have to lose connection for at least one announce cycle for the trackers to be affected, is that right?

I suspect this change is responsible for this. I should revert that.

Any chance someone could give that a try?

arvidn avatar Mar 25 '20 09:03 arvidn

Yes if you lose connection and reconnect, the trackers do not work even if i keep re announcing by stop/start.

ghost avatar Mar 25 '20 09:03 ghost

I think it’s unnecessary to back off tracker announces just because dns lookup failed. It can be a temporary failure due to loss of connection. It solves nothing but creates new issues.

ghost avatar Mar 25 '20 09:03 ghost

well, actually, a failure because the DNS server cannot be reached is a different error than the one I'm checking for in that patch. There are two kinds of "host not found" errors, authoritative and non-authoritative. The former (the error I check for) means the DNS server responded with NXDOMAIN error, i.e. the host name does not exist.

You shouldn't be getting an NXDOMAIN error by just losing connection. But, perhaps there are routers or ISPs in widespread use that get this wrong, and it's unreliable.

arvidn avatar Mar 25 '20 09:03 arvidn

Idk. Maybe sometimes ISP DNS or routers DNS server can lose connection and issue a NXDomain. It’s a unreliable method to rely on.

ghost avatar Mar 25 '20 09:03 ghost

@an0n666 if you experience this, could you test dig www.google.com for instance, when you have lost connectivity?

arvidn avatar Mar 25 '20 10:03 arvidn

I get a connection timed out in nslookup response. And now I can't reproduce the issue as well -_- Maybe it was a one time NXDOMAIN or something. Sorry for posting this here but now I think my issue is unrelated to the OP.

ghost avatar Mar 25 '20 13:03 ghost

I did a test with freshly built libtorrent-git and qbittorrent-git on Manjaro. Qbittorrent 4.2.2 (official build with 1.2.5) on Windows resumes as normal. On Manjaro with the self built versions via AUR it does not resume and DHT goes at 0. I can see a temporary fragment of an upload but that is about it.

I can probably catch some logs (just let me now what and how I should catch them for you) and rebuild with extra options if nessesary.

swejuggalo avatar Mar 26 '20 10:03 swejuggalo

Anyway, I'm stuck on af12f5d6b8dc47add564d7ce544bbaf26cd1de1e for the time being until these issues are fully resolved.

xnoreq avatar Mar 31 '20 14:03 xnoreq

trying to reproduce this issue, I'm on ubuntu with private internet access VPN. I'm using head of RC_1_2, which does have a few fixes since 1.2.5.

./client_test -f client-test.log -s . --alert_mask=error,connect,tracker --outgoing_interfaces=tun0 --listen_interfaces=tun0:12345 ubuntu-19.10-desktop-amd64.iso.torrent

When disabling the VPN, trackers fail with "No route to host" (as expected). When I reconnect the VPN and re-announce, trackers work and peers reconnect. The DHT routing table starts to build up again.

I could use some help understanding what other settings may affect this behavior.

arvidn avatar Mar 31 '20 14:03 arvidn

@xnoreq that commit is very recent on RC_1_2. Are you saying the issue is fixed in RC_1_2 then? and the "fully resolved" refers to a new release being made, is that right?

arvidn avatar Mar 31 '20 14:03 arvidn

trying to reproduce this issue, I'm on ubuntu with private internet access VPN. I'm using head of RC_1_2, which does have a few fixes since 1.2.5.

./client_test -f client-test.log -s . --alert_mask=error,connect,tracker --outgoing_interfaces=tun0 --listen_interfaces=tun0:12345 ubuntu-19.10-desktop-amd64.iso.torrent

When disabling the VPN, trackers fail with "No route to host" (as expected). When I reconnect the VPN and re-announce, trackers work and peers reconnect. The DHT routing table starts to build up again.

I could use some help understanding what other settings may affect this behavior.

Loosing internet, like in my case when rebooting router trigger the same loss of DHT and never recover it after reconnecting. I assume trackers fail too. And in my case no VPN involved. But 1.2.5 in windows with qbittorrent 4.2.2 does not have that behavior. Find that odd.

I have never used that command. Running it from a libtorrent-rasterbar installation directory I assume?

swejuggalo avatar Mar 31 '20 18:03 swejuggalo

@arvidn No, I initially pasted the wrong hash. I guess you're reading the mails and corrections through edits are not sent as mails again.

The commit is from 11th of January.

xnoreq avatar Apr 01 '20 17:04 xnoreq

I think the issue is with qbittorrent(-nox). Not sure what needs to be implemented to get the reconnection working on the side that uses libtorrent.

xnoreq avatar Apr 01 '20 17:04 xnoreq

Update: latest libtorrent and qbittorrent master doesn't even connect to peers anymore.

xnoreq avatar Apr 10 '20 11:04 xnoreq

@xnoreq under what conditions?

arvidn avatar Apr 10 '20 12:04 arvidn

@xnoreq under what conditions?

Replicated the same thing with libtorrent and qbittorrent (latest git) built earlier today. Simply adding a torrent and the traffic does not start at all. I don't remember the details of the commits in both projects... I'll take a look an see if I can guess what change that might have broken something :)

Qbittorrent did some recent peer related changes, but to webui... I'm not 100% from what date it broke...but I think it worked before today commits.

Rolling back to stable git version of Qbittorrent fixes the problem. Download starts. Going back to latest git breaks it again

swejuggalo avatar Apr 10 '20 12:04 swejuggalo

"latest git" means head of RC_1_2 in libtorrent?

arvidn avatar Apr 10 '20 13:04 arvidn

@arvidn Yes, RC_1_2 branch of libtorrent, qbittorrent master.

Going back to af12f5d6b8dc47add564d7ce544bbaf26cd1de1e works also with qbittorrent master.

xnoreq avatar Apr 10 '20 13:04 xnoreq

"latest git" means head of RC_1_2 in libtorrent?

Yeah. But this issue seems to be in Qbittorrent since reverting to stable Qbittorrent fixes it.

swejuggalo avatar Apr 10 '20 13:04 swejuggalo

@arvidn Yes, RC_1_2 branch of libtorrent, qbittorrent master.

Going back to af12f5d6b8dc47add564d7ce544bbaf26cd1de1e works also with qbittorrent master.

Ah...master means built with latests commits right? (never mind... Think I got it what Master is. Guess I'll check the scripts more in detail to know if it's Master or RC_1_2 I'm syncing against before building)... Then it is not that easy to point out the issue. I can probably modify the Manjaro AUR scripts to build specific versions (unmodified I can build either stable or with the most recent commits)

Edit. Changed AUR script to the RC_1_2 maintained by Chocobo1, so now I'm sure I have RC_1_2.

swejuggalo avatar Apr 10 '20 13:04 swejuggalo

The only recent relevant commit I can spot is https://github.com/qbittorrent/qBittorrent/commit/bf1c9e34d7a7361fec844a4412a0bdd4b8b12d4c ("Fix outgoing interface is not getting assigned"). They also changed something about resume data, not holding torrent (?) files in memory, but I doubt it is relevant.

xnoreq avatar Apr 10 '20 14:04 xnoreq

@xnoreq can you reproduce the problem with the latest qBittorrent master without that specific commit + libtorrent RC_1_2 @ 11b19ac813161a4bb939b5324dd295fdbc9861bd ?

FranciscoPombal avatar Apr 10 '20 17:04 FranciscoPombal

Seems like a fix is on the way in Qbittorrent. See the topic I previously linked

swejuggalo avatar Apr 12 '20 13:04 swejuggalo

Rebuild with last rc_1_2 against the most recent qbittorrent master as of now - working Unverified if the "reconnect breaks..." is solved though

swejuggalo avatar Apr 13 '20 08:04 swejuggalo

Just tested the latest Libtorrent RC_1_2 and latest Qbittorrent master without VPN, and the "Reconnect breaks connectivity" issue seems to be solved but I have these warnings:

4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found
4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found
4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found
4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found

zywo avatar Apr 13 '20 16:04 zywo

Just tested the latest Libtorrent RC_1_2 and latest Qbittorrent master without VPN, and the "Reconnect breaks connectivity" issue seems to be solved but I have these warnings:

4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found
4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found
4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found
4/13/20 4:36 PM - UPnP/NAT-PMP: Port mapping failure, message: could not map port using UPnP: no router found

That will get fixed once libtorrent 1.2.6 gets released or if you manually edit the source to set the UPnP lease duration to permanent (0).

For more info read https://github.com/qbittorrent/qBittorrent/issues/12406

ghost avatar Apr 13 '20 16:04 ghost

@swejuggalo Nope, the original issue is not fixed. Still trackers "not working" and DHT nodes sticking to 0 after VPN reconnect.

xnoreq avatar Apr 13 '20 16:04 xnoreq

@swejuggalo Nope, the original issue is not fixed. Still trackers "not working" and DHT nodes sticking to 0 after VPN reconnect.

I just rebuilt latest of both components and DHT recovers for me now and traffic resumes somewhat, but possible limited due to the other UPnP issue mentioned above (after reboot of router). Started more traffic are restarting the app... Connecting/Disconnecting VPN shouldn't that be very similar as a temporary loss of internet connection? With the other UPnP fix my issues will probably be gone

swejuggalo avatar Apr 13 '20 17:04 swejuggalo

Check your trackers. I am also retaining limited connectivity to some peers (to which there already was a connection before the reconnection).

xnoreq avatar Apr 13 '20 17:04 xnoreq

any updates on this issue? is there any way to fix it currently until new release?

xal3xhx avatar May 24 '20 15:05 xal3xhx

I just looked through the comments in this ticket again. I believe there's still not enough information about the conditions where this issue happens to do anything about it. More info in this post.

Or are you suggesting this is fixed in RC_1_2 already?

arvidn avatar May 24 '20 15:05 arvidn

no no its currently not fixed, just hopeful it will be soon

xal3xhx avatar May 26 '20 09:05 xal3xhx

im able to recreate this with the current release of qbittorrent with openvpn running on tun0 and qbit set to use only tun0,

im running in a freenas jail, if you have free nas running i can send you over the exported jail for testing

xal3xhx avatar May 26 '20 09:05 xal3xhx

Can I run that on a regular Ubuntu machine?

arvidn avatar May 27 '20 07:05 arvidn

no it has to be either freebsd or freenas

xal3xhx avatar May 27 '20 09:05 xal3xhx

I see. Do you think the jail plays a role in making it not work? I can imagine BSD is sufficiently different from linux/MacOS that I may need some additional support for enumerating routes or interfaces.

arvidn avatar May 27 '20 10:05 arvidn

I don’t think it being in a jail plays a part in it not working, I’ll try to spin up a Ubuntu vm with similar settings and try to replicate it. And when it comes to bsd while it may be quite different a lot of the same Linux tools are still there

xal3xhx avatar May 27 '20 10:05 xal3xhx

after some more testing i was not able to recreate this on a Ubuntu machine, going to spin up a freenas vm now and send it your way

xal3xhx avatar May 27 '20 12:05 xal3xhx

here is the freenas vm running qbittorrent in a jail

  • made and exported with vmware workstation pro https://www.dropbox.com/s/pl21vls9fn743gj/torrent%20testing.zip?dl=0

the username and password for both the freenas and the jail is user: root password: password

you can choose to SSH into either the freenas vm (recommended) or the torrent jail

qbittorrent ui login is default user: admin pass: adminadmin

info for the jail:

  • both qbittorrent-nox and openvpn start automatically
  • there is no openvpn config currently, you will need to add your own important file locations (when ssh'ed in to the main freenas machine) openvpn config: /mnt/Main/iocage/jails/torrent/root/usr/local/etc/openvpn there are 3 config locations, not sure what one is being used :/ (i think it depends on what user starts qbittorrent-nox) qbittorrent config (/.config): /mnt/Main/iocage/jails/torrent/root/.config/qBittorrent qbittorrent config (/root): /mnt/Main/iocage/jails/torrent/root/root/.config/qBittorrent qbittorrent config (/db): /mnt/Main/iocage/jails/torrent/root/var/db/qbittorrent

if you need more info just ask! and if you want a hand with it or a more direct line of communication, my discord in in my github bio

xal3xhx avatar May 27 '20 19:05 xal3xhx

It may be not about VPN. Just simple wired network reconnecting could reproduce this. qbittorrent/qBittorrent#12925

mr-cn avatar May 28 '20 06:05 mr-cn

For me it has improved (no longer stays at 0 - no VPN or anything - just cable). But it does not really fully recover after a reconnect. (Manjaro). The connections is beyond 300 before a temporary disconnect. After it looses almost 100 connections and never seem to recover fully. Restarting the client and it comes back to around the same mount as before. I have a debug build of libtorrent-rasterbar and qBittorrent but haven't really spent any time on making that very useful yet 😉

swejuggalo avatar May 28 '20 07:05 swejuggalo

It may be not about VPN. Just simple wired network reconnecting could reproduce this. qbittorrent/qBittorrent#12925

yes the issue is unrelated to the vpn, but was a easy way to replicate it in my case

For me it has improved (no longer stays at 0 - no VPN or anything - just cable). But it does not really fully recover after a reconnect. (Manjaro). The connections is beyond 300 before a temporary disconnect. After it looses almost 100 connections and never seem to recover fully. Restarting the client and it comes back to around the same mount as before.

i had similar behavior occur when i manually restarted qbittorrent-nox after the vpn had connected.. i did not think much of it as i thought it to be unrelated

xal3xhx avatar May 28 '20 19:05 xal3xhx

I don't want to, but I searched a lot and found nothing... Could anyone tell me how to get a verbose debug log that give us a detail about the reason it fails? I can easily reproduce the problem. (I am using qbittorrent)

mr-cn avatar Jun 09 '20 06:06 mr-cn

@mr-cn this is how I've tried to reproduce it:

./client_test -f client-test.log -s . --alert_mask=port_mapping,error,connect,tracker,session_log,dht --outgoing_interfaces=tun0 --listen_interfaces=tun0:12345 ubuntu-19.10-desktop-amd64.iso.torrent

client_test is an example program that comes with libtorrent.

This is assuming your VPN interface is called tun0, change the command line according to what yours is called.

-s . means the current directory is the download directory.

The log is printed to client-test.log.

arvidn avatar Jun 09 '20 08:06 arvidn

Also have this issue

oldominion avatar Jun 09 '20 18:06 oldominion

@arvidn I tried the client_test and it restored the download quickly. And at the same time, the qbittorrent running on a same machine, even in a same docker container, fails to reconnect to any tracker including DHT. I will try to get a detail log from qbittorrent.

mr-cn avatar Jun 11 '20 17:06 mr-cn

I have been experiencing this same issue, and this is how I've been able to produce it.

Active setup: VPN with qBit locked to a specific interface, all addresses on it.

Reproduced setup: Normal connect, same qBit arrangement though.

Both running Windows Server 2019 Datacenter

Reproduced steps:

  • Get qBit running (find some legal torrent to use, no need to risk getting into trouble over a test).
  • Change the IP address of the interface (using the GUI). Keep all other settings the same, just make sure that the new IP has internet access.
  • Go to qBit and force a reannounce. Trackers will fail to connect.
  • Restore IP address to original (previous) address.
  • Go to qBit and force a reannounce. Trackers work again.

Conclusion: The change of an IP address is causing the issue. Restarting the client is one way of resolving the issue, but not an acceptable solution, particularly when using a VPN and the connection can reset (resulting in a different IP) at any time.

Note: If the VPN disconnects and then reconnects with the same IP address, everything seems to continue to work without any issues. On some rare occasions, I've seen a different IP address, but without issues. However, this is the exception, not the rule. Out of a few dozen times of the IP address changing, there might be one instance of the issue not happening.

Wolfie713 avatar Jun 18 '20 06:06 Wolfie713

Interesting, I think this is a new aspect of this issue. I’ve only tested with starting with a specified address, not changing it run-time. It’s supposed to work, but probably not as well tested.

arvidn avatar Jun 18 '20 08:06 arvidn

I just had the VPN do a connection reset, using the same IP, and the issue arose. Not sure why that is, but I do know that changing the IP will cause it to break. Perhaps it's deeper than that, where any connection change can cause the issue (though might not if it's the same IP as before)? Not sure, either way, you should be able to replicate the issue with the steps I listed.

Wolfie713 avatar Jun 19 '20 05:06 Wolfie713

Seeing the same issue on Windows Server 2016; qBittorrent bound to VPN adaptor, if the VPN connection drops and re-establishes the client remains in a disconnected state until it's restarted. Logged error is:

30/06/2020 09:10 - Failed to listen on IP: 10.200.248.34, port: TCP/49059. Reason: The requested address is not valid in its context

I haven't been able to replicate the IP address issue mentioned by @Wolfie713 but only because I haven't been able to get the same IP address on reconnect in any of my tests.

thespad avatar Jun 30 '20 08:06 thespad

I haven't been able to replicate the IP address issue mentioned by @Wolfie713 but only because I haven't been able to get the same IP address on reconnect in any of my tests.

Pause all existing torrents, allow your regular IP address to work, and find a legal torrent to use. Then test it by changing the computers LAN IP, so that it still has internet access, but still a different IP address. Test it, then restore it to its original LAN IP and test again. (Also be sure to restore your VPN before restarting your normal torrents.)

Wolfie713 avatar Jun 30 '20 08:06 Wolfie713

Since this only seems to be an issue with qbittorrent and not the test clients I'm still wondering - months later - what qbittorrent could be doing wrong as it just uses libtorrent.

xnoreq avatar Jul 07 '20 19:07 xnoreq

@xnoreq @Wolfie713

https://github.com/qbittorrent/qBittorrent/issues/12361#issuecomment-653607115

Here is a test build with the following patch applied on top of latest master at time of writing (ca8654d380d3703d89c4b98528c3985a23d96330), to log tracker_error_alerts:

diff --git a/src/base/bittorrent/torrenthandleimpl.cpp b/src/base/bittorrent/torrenthandleimpl.cpp
index 062ec56a7..09ddee2da 100644
--- a/src/base/bittorrent/torrenthandleimpl.cpp
+++ b/src/base/bittorrent/torrenthandleimpl.cpp
@@ -1417,6 +1417,9 @@ void TorrentHandleImpl::handleTrackerErrorAlert(const lt::tracker_error_alert *p

    m_trackerInfos[trackerUrl].lastMessage = message;

+    LogMsg(tr("<tracker_error_alert> error: %1 | failure reason: %2")
+        .arg(QString::fromStdString(p->error.message()), p->error_message()), Log::WARNING);
+
    // Starting with libtorrent 1.2.x each tracker has multiple local endpoints from which
    // an announce is attempted. Some endpoints might succeed while others might fail.
    // Emit the signal only if all endpoints have failed.

https://github.com/FranciscoPombal/qBittorrent/actions/runs/156756747 (any of the 4 build variants will do, if in doubt use the one with libtorrent 1.2.7 + Qt 5.14.2).

Hopefully this can shine some new light on this issue.

FranciscoPombal avatar Jul 07 '20 20:07 FranciscoPombal

@arvidn I was able to reproduce the issue using Windscribe VPN.

Upon close inspection, here's what I got..

qBt starts, passes the VPN interface GUID to libtorrent to listen on. Everything works out fine.

7/26/2020 8:07 AM - Successfully listening on IP: 10.115.178.14, port: UDP/48623 7/26/2020 8:07 AM - Successfully listening on IP: 10.115.178.14, port: TCP/48623 7/26/2020 8:07 AM - Successfully listening on IP: fe80::9d5d:5907:15bf:e785%43, port: UDP/48623 7/26/2020 8:07 AM - Successfully listening on IP: fe80::9d5d:5907:15bf:e785%43, port: TCP/48623 7/26/2020 8:07 AM - Trying to listen on: {7CEAEE17-3AF9-4650-A06D-D07EC605E406}:48623

Here's what happens when the VPN is intentionally disconnected:

7/26/2020 8:08 AM - Failed to listen on IP: 169.254.231.133, port: TCP/48623. Reason: The requested address is not valid in its context

As you can see this is a link local address and in reality it doesn't exist on my system(I couldn't find it bound to any other interface). Edit: Also it only seem to trying to listen on TCP for that address not UDP.

When I reconnect the interface: 7/26/2020 8:08 AM - Successfully listening on IP: 10.115.178.24, port: UDP/48623 7/26/2020 8:08 AM - Successfully listening on IP: 10.115.178.24, port: TCP/48623

Upon reconnection listening succeeds and therefore DHT should start working again right? But it doesn't.

Also upon re-connection two tracker error alerts are generated spontaneously without any force announce. Thanks for @FranciscoPombal for providing a build with logging support.

7/26/2020 8:08 AM - <tracker_error_alert> error: A socket operation was attempted to an unreachable network | failure reason: 7/26/2020 8:08 AM - <tracker_error_alert> error: The system cannot open the device or file specified | failure reason:

However the tracker starts working again after some force re-announces. Just like DHT also starts working after disable + re-enable.

ghost avatar Jul 26 '20 02:07 ghost

@arvidn

DHT nodes stay at zero after reconnection. But if I start downloading a torrent it quickly jumps to 70+ nodes. Maybe libtorrent is failing to detect my external IP correctly unless I start downloading something.

ghost avatar Jul 30 '20 00:07 ghost

I think it's not a good idea to drop DHT nodes if external IP detection depends on things like UPnP, peers, trackers etc supplying the IP address. I noticed that my external IP changed(in qbit logs) after I started downloading. Which probably prompted a reboot of the DHT node.

ghost avatar Jul 30 '20 00:07 ghost

I think it's not a good idea to drop DHT nodes if external IP detection depends on things like UPnP, peers, trackers etc supplying the IP address.

Do you think dropping DHT nodes is a bad idea in general, or is there some other mechanism to detect ones IP where it would be a good idea? (you imply that it's only because of the way IP detection works, it's a bad idea)

arvidn avatar Jul 30 '20 08:07 arvidn

I implied it’s not a good idea since until my external IP is detected the DHT nodes stay at zero. I think this issue existed before as well but nobody noticed because DHT nodes weren’t dropped before when external IP changed/network disconnected. Now people are noticing it because of the drop.

ghost avatar Jul 30 '20 08:07 ghost

Also if I’m multi homed or using a bonded network where I may have multiple external IPs, then the scenario becomes more complex. DHT will probably just keep rebooting everytime libtorrent sees a new external address. Or maybe I understood it wrong.

ghost avatar Jul 30 '20 08:07 ghost

To add this to the mix, which may make it more complicated or maybe easier to sort out, or just confirm what's already being noticed...

One thing I had noticed is that the user with the program running the connection is normally the only one to experience the issue. Another user on the same computer seems to have no issues unless there is a major disconnect. Based on that, I looked into using the VPN I have as a service rather than an app, and since then, even though I know that there have been connection resets, no issues. (When browsing won't work for a few seconds here and there, then works again, you know the connection lapsed for a moment.) The only time that there has been an issue with torrents was when there was a disconnection of the main internet connection, ie, no internet at all, not just the VPN. At that point, would have to restart the client.

Now, I don't get why this would be, since I would think that everything about the VPN connection would be available to all permitted users, but something about it being a service vs assigned to a user seems to provide a little bit of a shield from the issue.

Before trying it as a service, on rare occasions I could get up to 4 days before the connection would reset and thus need to restart my torrent client. Usually one or two days at best. Tried as a service and had it running for over 9 days before I had an issue, which was the result of losing connection with my ISP. Never had that good of luck before, and I have no doubt that if the main connection hadn't been interrupted, I would have reached two weeks or more without an issue.

I should point out that the service is always connecting to the same VPN server, and that particular server gives a static IP, so the IP isn't changing. Even with the same IP being provided, sometimes the connection loss would result in the connectivity issue rearing its ugly head. As a service, no issues. I don't know if I would get the same results if the server being used provided dynamic IP's, though I'm sure the issue would arise again. Previously I was using my router to handle the VPN connection, and I would experience the issue (not having an idea as to why at that point), but the servers weren't static IP, and I believe the IPs were changing most times. So I speculate that if I did a service that didn't use a static IP server, I'd still notice the issue.

Wolfie713 avatar Jul 30 '20 10:07 Wolfie713

@an0n666 each listen interface (listen_socket_t in libtorrent) has its own DHT node running, with their own routing table. There are two kinds of "changed IP" events that can happen.

  1. A peer, a tracker or a DHT node echo back our external IP. In this case there's no explicit dropping of any DHT nodes. The relevant DHT node for the interface the connection was made over has its DHT node ID updated. This will change around which bucket nodes fall into, and some nodes may be dropped if they move from a larger bucket to a smaller, and they all don't fit.

  2. The local operating system indicates that an IP address changed of one of the local interfaces. In this case, all listen sockets are closed and re-opened. In this case, full DHT nodes and their routing tables are dropped. When they are re-opened, a DHT bootstrap is triggered to populate the routing tables.

Are you sure you're not experiencing (2)? If you enable session logging it should be clear which one is happening.

I think there are probably some fairly low-hanging fruit to improve the handling of (2). For instance, if the IP change is not affecting any of the listen interfaces in use, it could be ignored.

arvidn avatar Jul 30 '20 10:07 arvidn

The local operating system indicates that an IP address changed of one of the local interfaces. In this case, all listen sockets are closed and re-opened.

I think that's expected.

In this case, full DHT nodes and their routing tables are dropped.

This definitely happens. No doubt about that.

When they are re-opened, a DHT bootstrap is triggered to populate the routing tables.

This certainly doesn't happen. I double checked with wireshark...There's no UDP traffic for DHT when the interface is back online. However if I disable DHT in qBt settings and then re-enable it, I start seeing lots of UDP traffic and the nodes keep populating.

Also if I have an active download and interface goes down, when it comes back online the nodes start populating immediately.

I also tried downloading a private torrent with a fake tracker. Having a torrent in downloading state with fake tracker doesn't repopulate the DHT nodes.

However as soon I added a working tracker and got download activity I started seeing UDP traffic of DHT and the nodes started repopulating.

The node do not repopulate if I have seeding torrents.

I was able to reproduce the issue with Deluge 2.0.4.dev38 and latest libtorrent as well.

So this isn't a qBit specific issue.

I also enabled debug level logging in deluge but couldn't find anything relevant that might indicate what the issue is.

ghost avatar Aug 15 '20 01:08 ghost

Just tried latest RC_1_2 (f3c6658a3) and there's another thing I noticed: when booting my machine it starts qbittorrent and openvpn at the same time. This leads to the same problems, leaving qbittorrent in an "unconnected" state with 0 DHT nodes, etc.

I've configured it to bind to 0.0.0.0 (any IPv4), and netstat/ss shows it bound to the openvpn tun0 IP, so this has to be an application-level problem. It looks to me like it binds to the new interface but doesn't do anything besides that.

xnoreq avatar Aug 24 '20 00:08 xnoreq

Just tried latest RC_1_2 (f3c6658) and there's another thing I noticed: when booting my machine it starts qbittorrent and openvpn at the same time. This leads to the same problems, leaving qbittorrent in an "unconnected" state with 0 DHT nodes, etc.

I've configured it to bind to 0.0.0.0 (any IPv4), and netstat/ss shows it bound to the openvpn tun0 IP, so this has to be an application-level problem. It looks to me like it binds to the new interface but doesn't do anything besides that.

What happens if you have a active download?

ghost avatar Aug 24 '20 03:08 ghost

Just tried latest RC_1_2 (f3c6658) and there's another thing I noticed: when booting my machine it starts qbittorrent and openvpn at the same time. This leads to the same problems, leaving qbittorrent in an "unconnected" state with 0 DHT nodes, etc.

What happens if you have qbit delayed on boot? If the issue goes away, then it could still be a libtorrent issue, as it may be establishing connections before the VPN connection is fully established. My hunch is that this is what is happening, thus not a client issue.

Wolfie713 avatar Aug 24 '20 10:08 Wolfie713

I think I managed to reproduce it with libtorrent's client_test example:

What I did:

  1. disconnect openvpn
  2. start client_test: $ /usr/bin/client_test -f ./client_test.log -s . "magnet:?xt=urn:btih:49da7ae0de8874462471d0f5419b850e599b05ef"
  3. connect openvpn (timestamp 22:05:39)
  4. expect download starting - it doesn't
  5. quit the client after 2 minutes

The logfile:

[Aug 24 22:04:54] successfully listening on [TCP] 127.0.0.1:6881
[Aug 24 22:04:54] successfully listening on [UDP] 127.0.0.1:6881
[Aug 24 22:04:54] successfully listening on [TCP] 192.168.101.2:6881
[Aug 24 22:04:54] successfully listening on [UDP] 192.168.101.2:6881
[Aug 24 22:04:54] added torrent: 49da7ae0de8874462471d0f5419b850e599b05ef
[Aug 24 22:04:54] 49da7ae0de8874462471d0f5419b850e599b05ef added
[Aug 24 22:04:54] 49da7ae0de8874462471d0f5419b850e599b05ef: state changed to: dl metadata
[Aug 24 22:04:54] session stats header: peer.error_peers, peer.disconnected_peers, peer.eof_peers, peer.connreset_peers, peer.connrefused_peers, peer.connaborted_peers, peer.notconnected_peers, peer.perm_peers, peer.buffer_peers, peer.unreachable_peers, peer.broken_pipe_peers, peer.addrinuse_peers, peer.no_access_peers, peer.invalid_arg_peers, peer.aborted_peers, peer.piece_requests, peer.max_piece_requests, peer.invalid_piece_requests, peer.choked_piece_requests, peer.cancelled_piece_requests, peer.piece_rejects, peer.error_incoming_peers, peer.error_outgoing_peers, peer.error_rc4_peers, peer.error_encrypted_peers, peer.error_tcp_peers, peer.error_utp_peers, picker.reject_piece_picks, picker.unchoke_piece_picks, picker.incoming_redundant_piece_picks, picker.incoming_piece_picks, picker.end_game_piece_picks, picker.snubbed_piece_picks, picker.interesting_piece_picks, picker.hash_fail_piece_picks, picker.piece_picker_partial_loops, picker.piece_picker_suggest_loops, picker.piece_picker_sequential_loops, picker.piece_picker_reverse_rare_loops, picker.piece_picker_rare_loops, picker.piece_picker_rand_start_loops, picker.piece_picker_rand_loops, picker.piece_picker_busy_loops, peer.connect_timeouts, peer.uninteresting_peers, peer.timeout_peers, peer.no_memory_peers, peer.too_many_peers, peer.transport_timeout_peers, peer.num_banned_peers, peer.banned_for_hash_failure, peer.connection_attempts, peer.connection_attempt_loops, peer.boost_connection_attempts, peer.missed_connection_attempts, peer.no_peer_connection_attempts, peer.incoming_connections, net.on_read_counter, net.on_write_counter, net.on_tick_counter, net.on_lsd_counter, net.on_lsd_peer_counter, net.on_udp_counter, net.on_accept_counter, net.on_disk_queue_counter, net.on_disk_counter, ses.torrent_evicted_counter, ses.num_incoming_choke, ses.num_incoming_unchoke, ses.num_incoming_interested, ses.num_incoming_not_interested, ses.num_incoming_have, ses.num_incoming_bitfield, ses.num_incoming_request, ses.num_incoming_piece, ses.num_incoming_cancel, ses.num_incoming_dht_port, ses.num_incoming_suggest, ses.num_incoming_have_all, ses.num_incoming_have_none, ses.num_incoming_reject, ses.num_incoming_allowed_fast, ses.num_incoming_ext_handshake, ses.num_incoming_pex, ses.num_incoming_metadata, ses.num_incoming_extended, ses.num_outgoing_choke, ses.num_outgoing_unchoke, ses.num_outgoing_interested, ses.num_outgoing_not_interested, ses.num_outgoing_have, ses.num_outgoing_bitfield, ses.num_outgoing_request, ses.num_outgoing_piece, ses.num_outgoing_cancel, ses.num_outgoing_dht_port, ses.num_outgoing_suggest, ses.num_outgoing_have_all, ses.num_outgoing_have_none, ses.num_outgoing_reject, ses.num_outgoing_allowed_fast, ses.num_outgoing_ext_handshake, ses.num_outgoing_pex, ses.num_outgoing_metadata, ses.num_outgoing_extended, ses.num_piece_passed, ses.num_piece_failed, ses.num_have_pieces, ses.num_total_pieces_added, disk.num_blocks_written, disk.num_blocks_read, disk.num_blocks_hashed, disk.num_blocks_cache_hits, disk.num_write_ops, disk.num_read_ops, disk.num_read_back, disk.disk_read_time, disk.disk_write_time, disk.disk_hash_time, disk.disk_job_time, ses.waste_piece_timed_out, ses.waste_piece_cancelled, ses.waste_piece_unknown, ses.waste_piece_seed, ses.waste_piece_end_game, ses.waste_piece_closing, net.sent_payload_bytes, net.sent_bytes, net.sent_ip_overhead_bytes, net.sent_tracker_bytes, net.recv_payload_bytes, net.recv_bytes, net.recv_ip_overhead_bytes, net.recv_tracker_bytes, net.recv_failed_bytes, net.recv_redundant_bytes, dht.dht_messages_in, dht.dht_messages_in_dropped, dht.dht_messages_out, dht.dht_messages_out_dropped, dht.dht_bytes_in, dht.dht_bytes_out, dht.dht_ping_in, dht.dht_ping_out, dht.dht_find_node_in, dht.dht_find_node_out, dht.dht_get_peers_in, dht.dht_get_peers_out, dht.dht_announce_peer_in, dht.dht_announce_peer_out, dht.dht_get_in, dht.dht_get_out, dht.dht_put_in, dht.dht_put_out, dht.dht_sample_infohashes_in, dht.dht_sample_infohashes_out, dht.dht_invalid_announce, dht.dht_invalid_get_peers, dht.dht_invalid_find_node, dht.dht_invalid_put, dht.dht_invalid_get, dht.dht_invalid_sample_infohashes, utp.utp_packet_loss, utp.utp_timeout, utp.utp_packets_in, utp.utp_packets_out, utp.utp_fast_retransmit, utp.utp_packet_resend, utp.utp_samples_above_target, utp.utp_samples_below_target, utp.utp_payload_pkts_in, utp.utp_payload_pkts_out, utp.utp_invalid_pkts_in, utp.utp_redundant_pkts_in, sock_bufs.socket_send_size3, sock_bufs.socket_send_size4, sock_bufs.socket_send_size5, sock_bufs.socket_send_size6, sock_bufs.socket_send_size7, sock_bufs.socket_send_size8, sock_bufs.socket_send_size9, sock_bufs.socket_send_size10, sock_bufs.socket_send_size11, sock_bufs.socket_send_size12, sock_bufs.socket_send_size13, sock_bufs.socket_send_size14, sock_bufs.socket_send_size15, sock_bufs.socket_send_size16, sock_bufs.socket_send_size17, sock_bufs.socket_send_size18, sock_bufs.socket_send_size19, sock_bufs.socket_send_size20, sock_bufs.socket_recv_size3, sock_bufs.socket_recv_size4, sock_bufs.socket_recv_size5, sock_bufs.socket_recv_size6, sock_bufs.socket_recv_size7, sock_bufs.socket_recv_size8, sock_bufs.socket_recv_size9, sock_bufs.socket_recv_size10, sock_bufs.socket_recv_size11, sock_bufs.socket_recv_size12, sock_bufs.socket_recv_size13, sock_bufs.socket_recv_size14, sock_bufs.socket_recv_size15, sock_bufs.socket_recv_size16, sock_bufs.socket_recv_size17, sock_bufs.socket_recv_size18, sock_bufs.socket_recv_size19, sock_bufs.socket_recv_size20, ses.num_checking_torrents, ses.num_stopped_torrents, ses.num_upload_only_torrents, ses.num_downloading_torrents, ses.num_seeding_torrents, ses.num_queued_seeding_torrents, ses.num_queued_download_torrents, ses.num_error_torrents, ses.non_filter_torrents, peer.num_tcp_peers, peer.num_socks5_peers, peer.num_http_proxy_peers, peer.num_utp_peers, peer.num_i2p_peers, peer.num_ssl_peers, peer.num_ssl_socks5_peers, peer.num_ssl_http_proxy_peers, peer.num_ssl_utp_peers, peer.num_peers_half_open, peer.num_peers_connected, peer.num_peers_up_interested, peer.num_peers_down_interested, peer.num_peers_up_unchoked_all, peer.num_peers_up_unchoked_optimistic, peer.num_peers_up_unchoked, peer.num_peers_down_unchoked, peer.num_peers_up_requests, peer.num_peers_down_requests, peer.num_peers_up_disk, peer.num_peers_down_disk, peer.num_peers_end_game, disk.write_cache_blocks, disk.read_cache_blocks, disk.request_latency, disk.pinned_blocks, disk.disk_blocks_in_use, disk.queued_disk_jobs, disk.num_running_disk_jobs, disk.num_read_jobs, disk.num_write_jobs, disk.num_jobs, disk.num_writing_threads, disk.num_running_threads, disk.blocked_disk_jobs, disk.queued_write_bytes, ses.num_unchoke_slots, disk.num_fenced_read, disk.num_fenced_write, disk.num_fenced_hash, disk.num_fenced_move_storage, disk.num_fenced_release_files, disk.num_fenced_delete_files, disk.num_fenced_check_fastresume, disk.num_fenced_save_resume_data, disk.num_fenced_rename_file, disk.num_fenced_stop_torrent, disk.num_fenced_flush_piece, disk.num_fenced_flush_hashed, disk.num_fenced_flush_storage, disk.num_fenced_trim_cache, disk.num_fenced_file_priority, disk.num_fenced_load_torrent, disk.num_fenced_clear_piece, disk.num_fenced_tick_storage, disk.arc_mru_size, disk.arc_mru_ghost_size, disk.arc_mfu_size, disk.arc_mfu_ghost_size, disk.arc_write_size, disk.arc_volatile_size, dht.dht_nodes, dht.dht_node_cache, dht.dht_torrents, dht.dht_peers, dht.dht_immutable_data, dht.dht_mutable_data, dht.dht_allocated_observers, net.has_incoming_connections, net.limiter_up_queue, net.limiter_down_queue, net.limiter_up_bytes, net.limiter_down_bytes, utp.num_utp_idle, utp.num_utp_syn_sent, utp.num_utp_connected, utp.num_utp_fin_sent, utp.num_utp_close_wait, utp.num_utp_deleted, ses.num_outstanding_accept, tracker.num_queued_tracker_announces
[Aug 24 22:04:54] DHT error [hostname_lookup] (2) Host not found (non-authoritative), try again later
[Aug 24 22:04:54] DHT error [hostname_lookup] (2) Host not found (non-authoritative), try again later
[Aug 24 22:04:54] 49da7ae0de8874462471d0f5419b850e599b05ef resume data was not generated: torrent has no metadata
[Aug 24 22:04:54] 49da7ae0de8874462471d0f5419b850e599b05ef resumed
[Aug 24 22:05:39] successfully listening on [TCP] 10.8.1.2:6881
[Aug 24 22:05:39] successfully listening on [UDP] 10.8.1.2:6881
[Aug 24 22:05:39] NAT-PMP: failed to find default route for "tun0" 10.8.1.2: Success
[Aug 24 22:05:39] NAT-PMP: closing
[Aug 24 22:05:39] UPnP: failed to open multicast socket: "No such device"
[Aug 24 22:05:39] UPnP: adding port map: [ protocol: tcp ext_port: 6881 local_ep: 10.8.1.2:6881 ] DISABLED
[Aug 24 22:05:39] UPnP: adding port map: [ protocol: udp ext_port: 6881 local_ep: 10.8.1.2:6881 ] DISABLED

This is what the client looks like even 2 minutes after openvpn connected:

[all][downloading][non-paused][seeding][queued][stopped][checking]
 #   Name                                               Progress                            Pieces         Download          Upload            Peers (D:S) Down   Up     Flags
0                                                      dl metadata (0.0%)                       0/     0          (      )          (      )     0:0                   S


 fail:        down:          (      )   bw queue:   0 |   0 conns:   0  unchoked:  0 /  8 queued-trackers: 00                                                                                       waste:          up:          (      ) disk queue:   0 |   0 cache w:   0% r:   0% size: w:        r:        total:
uTP idle: 0 syn: 0 est: 0 fin: 0 wait: 0
[                                                                                                                                                                                                                                                                                                                                                                                                    ]

xnoreq avatar Aug 24 '20 20:08 xnoreq

So DHT fails since hostname can’t be resolved when the interface is down. And hostname isn’t looked up again after interface is back online? This probably explains why the tracker also doesn’t work. Due to failed lookups when the interface goes down. And probably not looked up again when interface is up.

ghost avatar Aug 25 '20 03:08 ghost

Perhaps it would make sense to have a periodic timer that checks if there are any DHT nodes (or maybe more than 2). If there isn't, it triggers a bootstrap.

The question is, how frequently should that be checked?

If the bootstrap node is truly not responding, checking too often could be problematic, as an increasing number of peers would come online and the request rate to it would increase. In fact, there should probably be some kind of exponential back-off. Where if the DHT has been down for a very long time, the periodic checks happen at increasing intervals.

arvidn avatar Aug 25 '20 11:08 arvidn

@arvidn

Up until 1.2.4 DHT nodes weren’t dropped and nobody had any issues.

Why is it necessary to drop the nodes? I believe connectivity is more important than anything else. If someone is dependent on DHT they’re kinda screwed when their connection goes offline.

ghost avatar Aug 25 '20 13:08 ghost

The question is, how frequently should that be checked?

Each time you do bootstrapping, until bootstrapping eventually succeeds?

If the bootstrap node is truly not responding, checking too often could be problematic, as an increasing number of peers would come online and the request rate to it would increase. In fact, there should probably be some kind of exponential back-off. Where if the DHT has been down for a very long time, the periodic checks happen at increasing intervals.

Yeah, you could increase the timeout with each failed attempt up to some limit (a couple of minutes?).

The question I have is why bootstrapping doesn't happen using previously connected and persisted nodes?! Having a single address, a single bootstrap node means not only a single point of failure but also a central point (in a system that's supposed to be decentralized). This central point could also potentially be abused for logging..

Is there an option to disable this?

xnoreq avatar Aug 25 '20 19:08 xnoreq

Up until 1.2.4 DHT nodes weren’t dropped and nobody had any issues.

Why is it necessary to drop the nodes? I believe connectivity is more important than anything else. If someone is dependent on DHT they’re kinda screwed when their connection goes offline.

I don't know what changed in 1.2.4 to affect this. I don't think the way the DHT is restarted changed. Nodes aren't dropped, they are pulled out of the routing table, the DHT node is restarted (likely with a new external IP address) and they are then inserted into that new node's routing table.

The reason it's done this way is because the node ID is tied to the external IP address, so when that changes, the node ID changes, which in turn changes around the routing table. The routing table "radiates" out from ones own node ID.

So, I suppose if you change IP from having one, to not having one (as I imagine might be the case if the VPN goes down), there's nowhere to return the nodes to, so that might actually cause them to drop.

To answer the question "why is it necessary to drop the nodes?", it's not necessary, but half may be dropped (worst case) when the node ID changes. However, if the listen interface disappears, there is no DHT node anymore either, which probably would cause everything to be dropped.

Perhaps the nodes could be kept around somewhere, ready to populate a DHT node's routing table if one is started. You probably wouldn't want those to sit around indefinitely though.

arvidn avatar Aug 25 '20 20:08 arvidn

The question is, how frequently should that be checked?

Each time you do bootstrapping, until bootstrapping eventually succeeds?

That's not a frequency.

The question I have is why bootstrapping doesn't happen using previously connected and persisted nodes?!

DHT nodes are saved in the session state, if the client saves and restores the session state, it will get a bunch of potential DHT nodes to bootrstrap from as well. However, I believe those nodes are only used for the first bootstrap, subsequent ones use the nodes in the routing table for bootstrapping.

I think the problem is that when the network interface disappears, the DHT node also goes away, along with its routing table and all nodes. So when the DHT is then started again, it's possible it's starting from scratch.

Is there an option to disable this?

disable what?

arvidn avatar Aug 25 '20 20:08 arvidn

That's not a frequency.

Maybe I misunderstood. I assumed that the bootstrapping is triggered automatically after loss of connectivity (be it interface down or all nodes offline) and this trigger should be used to start a periodic timer with increasing timeouts that will check whether the bootstrapping was actually successful (if so just kill the timer) or not. In the latter case bootstrapping should simply be tried again.

If there are no such triggers then just checking every second wouldn't hurt I guess. The checks are quite simple, aren't they?

xnoreq avatar Aug 26 '20 16:08 xnoreq

Any news here?

olegvg avatar Oct 11 '20 20:10 olegvg

Besides retrying failed name resolution / bootstrapping, this also should be fixed:

DHT nodes are saved in the session state, if the client saves and restores the session state, it will get a bunch of potential DHT nodes to bootrstrap from as well. However, I believe those nodes are only used for the first bootstrap, subsequent ones use the nodes in the routing table for bootstrapping.

I think the problem is that when the network interface disappears, the DHT node also goes away, along with its routing table and all nodes. So when the DHT is then started again, it's possible it's starting from scratch.

A number of nodes (from all interfaces) should be persisted and used when connecting the first time or re-connecting.

xnoreq avatar Dec 30 '20 15:12 xnoreq

A number of nodes (from all interfaces) should be persisted and used when connecting the first time or re-connecting.

It sounds like you're exactly describing the current behavior, as I described as:

DHT nodes are saved in the session state, if the client saves and restores the session state, it will get a bunch of potential DHT nodes to bootrstrap from as well. However, I believe those nodes are only used for the first bootstrap

the problem appears to be that subsequent bootstraps (as opposed to the first bootstrap) get too few nodes to start from.

arvidn avatar Dec 30 '20 16:12 arvidn

That's why I wrote "when connecting the first time or re-connecting". So these bootstrapping nodes shouldn't be dropped when an interface goes down. If they are stored per interface then they can be updated/replaced periodically with the best available nodes for that interface.

So if an interface goes down, dropping the associated buckets of nodes is fine, but if it reappears then it should be filled with this set of bootstrapping nodes.

This leaves one more scenario we see here: a (tunnel) interface is up but has no connectivity. I assume that the DHT implementation would, over time, purge all nodes because they're not reachable.

I'm not sure if you get an event for this or if it needs to be done periodically (like every minute or make it configurable). But if we see that the buckets have emptied, add all bootstrapping nodes again but now also resolve dht.libtorrent.org and add it as well. With that the second bootstrapping attempt would start.

xnoreq avatar Dec 30 '20 16:12 xnoreq

@arvidn Someone is claiming that the change in this line introduced the issue: https://github.com/qbittorrent/qBittorrent/issues/13794#issuecomment-754283051

ghost avatar Jan 11 '21 04:01 ghost

I believe this fixes it. Could someone give it a try? https://github.com/arvidn/libtorrent/pull/6338

arvidn avatar Jul 25 '21 20:07 arvidn

@arvidn does this fix applies to MacOS specifically? If not then maybe @xnoreq can test the patch?

summerqB avatar Jul 27 '21 16:07 summerqB

Tried latest RC_1_2 and after reconnecting the VPN qbittorrent stays at 0 DHT nodes.

xnoreq avatar Jul 28 '21 09:07 xnoreq

Tried latest RC_1_2 and after reconnecting the VPN qbittorrent stays at 0 DHT nodes.

Not latest RC_1_2. You have to test from this branch https://github.com/arvidn/libtorrent/tree/network-up-osx since it hasn't been merged yet.

summerqB avatar Jul 28 '21 09:07 summerqB

Still doesn't work and I don't see how that commit would fix the described issue.

xnoreq avatar Jul 28 '21 09:07 xnoreq

@xnoreq could you provide logs from client_test, with alert_mask=all?

The issue I can reproduce on MacOS when disabling and re-enabling the network interface, is that the process is notified of the network coming up before the routing table is set up. The notification causes the listen sockets to be opened up again, and part of that logic is to figure out which networks it makes sense to run a DHT node on. For this, it looks at routing table for routes to the internet.

On MacOS the routing table can be empty right at that point. So some of the logic that looks at the routing table is disabled in case it's empty. That's what this patch does.

arvidn avatar Jul 30 '21 23:07 arvidn

@xnoreq Someone is claiming that listening only on IPv4 solves the problem. Can you test and confirm? https://github.com/qbittorrent/qBittorrent/issues/15253#issuecomment-891338174 You'll have to choose your specific VPN interface and then set optional IP address to bind to option to "All IPv4 Addresses".

Also the claim in this comment https://github.com/qbittorrent/qBittorrent/issues/13794#issuecomment-754283051 refers to a line of code that indicates a change related to IPv6 which might have introduced this issue.

summerqB avatar Aug 03 '21 12:08 summerqB

I can't reproduce this problem with this patch applied https://github.com/arvidn/libtorrent/pull/6338

I tested with a VPN as well, listening to a specific interface.

arvidn avatar Aug 08 '21 03:08 arvidn

@sledgehammer999 can you please provide a qBt 4.3.7 win x64 build with latest RC_1_2? I want to test if this issue is fixed on Windows with the latest commit.

summerqB avatar Aug 18 '21 16:08 summerqB

looking over this thread again, scratch my comment about the PR "fixing it".

It sounds more likely the problem is:

  • the initial DHT bootstrap happens while there is no network connectivity

  • the router node can't be found, any saved DHT nodes are tried and fail and are discarded

  • bootstrap as a whole fails, we have no more known peers

  • network connectivity is restored but the DHT bootstrap doesn't trigger. If network connectivity is happening away from the main computer (say, by restarting the router), there is no trigger on the machine running libtorrent to notify it of there being a route to the internet.

arvidn avatar Aug 19 '21 04:08 arvidn

Rather interesting thing happened for me earlier. Had to relaunch qBit because of a connection reset and a couple of torrents that were pending for download started up right away (connected to trackers just fine) while those for seeding took awhile to finally start working. Not sure if it was just a fluke/coincidence, or if there might be something about it that makes a difference. To the same trackers, for the record, so not like the downloads worked because of being from a different source.

Wolfie713 avatar Sep 17 '21 03:09 Wolfie713

Just wanted to chime in here. I had this issue for a few versions of qBit, and I have been resolving it when it happened by simply pausing and resuming all torrents which fixed it.

The issue happens for me when my internet cable modem is restarted / my ISP shuts it down for maintenance for a few minutes, thereby triggering a VPN re-connection. It does not happen for me when I restart my VPN or change VPN servers.

Even when my modem is restarted, the DHT table gets spun back up in a few seconds, its just the trackers that show errors such as not contacted yet not not working. Again, pausing all torrents and resuming all torrents seems to resolve it.

As a strong test method, I would recommend trying to restart your modems / routers / ISP connections and see if it lets you reproduce this.

gothicserpent avatar Dec 09 '21 13:12 gothicserpent