rethink-app icon indicating copy to clipboard operation
rethink-app copied to clipboard

Wireguard VPN's continuously failing ~70% of the time

Open NlGHT opened this issue 2 months ago • 6 comments

I love this program and I've gone through and looked for historical mentions of this and I've tried a decent number of the things that other experienced to fix this. I should mention that tbh I've never been able to have a VPN connection not failing for less than 50% of the time I want to use my phone. In general I put the failing rate at around 70%, so I've come with some very verbose logs and the things I've tried.

Specs:

  • GrapheneOS Pixel 9 (Android version 16)
  • Rethink v0.5.5t (fdroid)
  • Multi user setup (4 total), each user has different Wireguard VPNs running in the background with optimisation off.
  • One user can be online while the other offline.
  • VPN's are all Wireguard from Proton VPN (all free plan) downloaded as configs from downloads section (https://account.protonvpn.com/downloads); it could totally be that I'm on free and they don't like me connecting with so many simultaneously but idk that also seems weird.
  • Using cellular and wifi switching between when available. No apps are restricted to one or the other.
  • Happening both on Advanced proxy mode (with 3 simulataneously) and regular with just 1.

The problem:

  • To my knowledge there isn't anything in particular that sets it off, it seems very random (sry!), but I am happy to try anything out.
  • I am blocking a TON with the blocklists and on a per-app basis too but these I know are setup fine because they work when there is a connection.

Things I tried:

  • Initially I thought it might be alleviated with multiple VPNs so that when one fails it would switch over but instead they just all continuously fail, same problem and same rate (if anything maybe higher rate).
  • Changing the IPV4 to Auto does nothing (saw that from celzero on a reddit post)
  • Refreshing the VPN in the proxy page does nothing at all.

Here are various "very verbose" logs on the profile where it was failing (also tried to specifically capture some connections and the refresh of VPNs): RethinkLogsNotWorking.zip Here is on another profile right afterwards (pretty much) which was connecting fine on another profile and Wireguard config but still Proton VPN free:rethink_app_logs_v0.5.5t_1761314122488-WhileWorking.zip

Here are also some screenshots of what I see at the different pages, timestamped: ScreenshotsTimestamped.zip As you can see in the subfolder there is also a mismatch between what it says the status is on the Proxy page vs the page where you can see the actual different profiles. These screenshots in that subfolder are in pairs taken seconds apart as seen in the timestamp. These screenshots are however NOT the same as from the logs unfortunately (Sorry again!!). I had originally thought that the logs would just constantly write to some sort of saved buffer so I had planned to store up a ton of data over a week but then discovered that isn't really the case. 😢 I potentially could get some screenshots though. I just needed to report it because I can't really keep this up much longer it's become pretty hard to use.

I would like to help and know a decent amount about android programming but these logs really are VERY verbose and can hardly make sense of it!! However I do not have anything ADB related setup (and would ideally prefer not to have to install it if possible, but would if really required).

Let me know what you think or make of this, it's such a great app I really would love to use it properly because it fills that perfect spot. But I also can't keep it up as it has made for so many constant awkward positions where I just can't get internet.

NlGHT avatar Oct 24 '25 14:10 NlGHT

As a stop gap, set Persistent KeepAlive for your Proton configs to 30 seconds and see if things then work flawlessly? This will drain battery but the official Proton app does force set Persistent KeepAlive to 60 seconds, if it is any consolation.

Using cellular and wifi switching between when available. No apps are restricted to one or the other.

Will you reword this, please? I didn't get what "switching" means? Nor, "restricted to one or the other"?

Refreshing the VPN in the proxy page does nothing at all.

Concerning.

blocking a TON with the blocklists and on a per-app basis too

That's okay. This issue seems unrelated.

VPN's are all Wireguard from Proton VPN (all free plan) downloaded as configs from downloads section (https://account.protonvpn.com/downloads); it could totally be that I'm on free and they don't like me connecting with so many simultaneously but idk that also seems weird

Not weird. Empirically speaking, they do do this.

ignoramous avatar Oct 26 '25 12:10 ignoramous

As a stop gap, set Persistent KeepAlive for your Proton configs to 30 seconds and see if things then work flawlessly? This will drain battery but the official Proton app does force set Persistent KeepAlive to 60 seconds, if it is any consolation.

Ah good idea! 😄 It is pretty weird that they would have a mismatch between the official app and the independent wireguard services but that is really interesting. I'll give it a go and report back on that within a few days of testing!

Will you reword this, please? I didn't get what "switching" means? Nor, "restricted to one or the other"?

This is just relating to upon leaving a wifi range, the phone continues off of cellular and then when coming back within wifi range, changes to using the wifi. Just referring to the usual android behaviour.

Concerning.

Yeah I'm not even sure what is expected when you press refresh. Is it supposed to basically reset the connection and is it then expected to fix a failing connection? If this is expected behaviour I can confirm that unfortunately nothing changes. Everything remains exactly the same status before as after. Another problem related to this might be that there is that mismatch between what you see on that page with the refresh button versus what you see in the page when you click through to view all the wireguard profiles as shown in my screenshots.

Not weird. Empirically speaking, they do do this.

Right, well it could very well be. Would it be possible to make this out from the logs somehow to confirm?

NlGHT avatar Oct 26 '25 13:10 NlGHT

Yeah I'm not even sure what is expected when you press refresh. Is it supposed to basically reset the connection and is it then expected to fix a failing connection?

It is, but in Proton's case, I've noticed that restarting the existing WireGuard "device" (which is what happens during a refresh) doesn't help. May be we need to recreate the WireGuard "device".

I'll see if we can get it done in time for v055u, the next release.

But yeah, Proton's issues have been annoying to debug, let alone "fix" (restart/recreate is a workaround, not a fix for whatever is causing the data stall with Proton).

Another problem related to this might be that there is that mismatch between what you see on that page with the refresh button versus what you see in the page when you click through to view all the wireguard profiles as shown in my screenshots

The inconsistency exists only in the UI. There's some bug there. I'll open a new issue to track fixing it.

Would it be possible to make this out from the logs somehow to confirm?

Yes. The WireGuard handshakes would go through just fine but any attempt to send actual data will always fail with timeouts. You can search for corresponding WireGuard logs using wg+ID (ID is usually a number shown dimly against each entry at Configure -> Proxy -> Setup WireGuard); ex: wg4) in Configure -> Settings -> App logs (don't forget to set the log level to "Very verbose" by tapping on the filter icon in the search bar of the App logs UI).

ignoramous avatar Oct 26 '25 17:10 ignoramous

As a stop gap, set Persistent KeepAlive for your Proton configs to 30 seconds and see if things then work flawlessly? This will drain battery but the official Proton app does force set Persistent KeepAlive to 60 seconds, if it is any consolation.

So I've tried this and definitely the problems are still there. Also I should note that when you use the downloaded configs from Proton they actually don't come with a Persistent Keep Alive value at all, as in not even mentioned.

May be we need to recreate the WireGuard "device".

Interesting! Maybe this would somewhat improve things.

The WireGuard handshakes would go through just fine but any attempt to send actual data will always fail with timeouts.

This does seem to appear to be what's happening from what I can see in the logs I sent above:

1761312093230,Y GoLog: Y alg.go:1975>alg.go:1943>proxies.go:497>proxies.go:621>proxies.go:628>proxy.go:649>wgproxy.go:1074>stats.go:124: wg: ReadStats: LatestRecentHandshake: 2m 1s, Peers: 1
1761312093231,Y GoLog: Y alg.go:1911>doh.go:790>doh.go:767>alg.go:1975>alg.go:1943>proxies.go:497>proxies.go:621>proxies.go:633: proxy: pin: ok? true; 104.18.0.48:443 from wg1; err? <nil>
1761312093231,D GoLog: D wgproxy.go:311>wgproxy.go:276: proxy: wg: wg1 (novia/zz) ping: 1/1 peers; via OK? false
1761312093232,D GoLog: D wgproxy.go:314: proxy: wg: wg1 (novia/zz); onNotOK: refresh? false+false; ping? true; ok? false+true; err? <nil>
1761312093232,D GoLog: D alg.go:733>alg.go:1911>doh.go:790>doh.go:767>alg.go:1975>alg.go:1952: doh: BlockFree : sky.rethinkdns.com: proxy for 104.18.0.48:443 [among [104.18.0.48:443 104.18.0.48:443 104.18.1.48:443]]; choosing wg1 among [wg1 Base]; errs? <nil>
1761312093233,D GoLog: D transport.go:1597>transport.go:1615>transport.go:1780>transport.go:1285>wgproxy.go:184>wgproxy.go:1113: wg: wg1 (novia/zz) dial: start tcp sky.rethinkdns.com:443
1761312093234,V GoLog: V wgnet.go:126>wgnet.go:54>wgnet.go:72>dns.go:33>ipmap.go:262>ipmapper.go:98>ipmapper.go:127: ipmapper: lookup: host ip:sky.rethinkdns.com for -1 on [wg1]
1761312093235,V GoLog: V async.go:81>barrier.go:193>ipmapper.go:262>transport.go:444>transport.go:540: dns: fwd: 1 for rethink; query sky.rethinkdns.com:1, r1; [prefs:&{Base  Default  true}; chosen:[wg1]]
1761312093236,D GoLog: D async.go:81>barrier.go:193>ipmapper.go:262>transport.go:444>transport.go:542>transport.go:1059: dns: pref: use chosen tr(wg1, ) for sky.rethinkdns.com
1761312093237,V GoLog: V async.go:81>barrier.go:193>ipmapper.go:262>transport.go:444>transport.go:545: dns: fwd: 2 for rethink; query sky.rethinkdns.com:1, r1; [prefs:&{Base  Default  true}; chosen:[wg1]]; id? wg1, sid? , pid? Base, ips? []
1761312093239,D GoLog: D async.go:81>barrier.go:193>ipmapper.go:262>transport.go:444>transport.go:562>wall.go:124: wall: skip local for sky.rethinkdns.com. blockQ for  with err no blocklist applies
1761312093240,V GoLog: V async.go:81>barrier.go:193>ipmapper.go:262>transport.go:444>transport.go:581: dns: fwd: 4 for rethink; r1, query NOT blocked sky.rethinkdns.com:1; why? no blocklist applies
1761312093240,V GoLog: V ipmapper.go:262>transport.go:444>transport.go:593>alg.go:913>alg.go:723>alg.go:701>alg.go:1748: alg: resolv: sky.rethinkdns.com:wg1[rethink] => real(ip4 0, ip6 0) until: 0s; stale []
1761312093241,Y GoLog: Y async.go:81>barrier.go:193>ipmapper.go:262>transport.go:444>transport.go:593>alg.go:913>alg.go:723>alg.go:712: alg: response for sky.rethinkdns.com by wg1[rethink] (q4? true / q6? false) realip; in cache? [] [until: 0ms] (or stale? [])

Or this where I see the domain name:

1761312093217,Y GoLog: Y alg.go:1975>alg.go:1943>proxies.go:497>proxies.go:621>proxies.go:628>proxy.go:649>wgproxy.go:1074>stats.go:124: wg: ReadStats: LatestRecentHandshake: 2m 1s, Peers: 1
1761312093218,V GoLog: V async.go:58>transport.go:847>transport.go:801>transport.go:545: dns: fwd: 2 for 1410199; query g.api.mega.co.nz:28, r1; [prefs:&{wg1,Base  BlockFree  true}; chosen:[]]; id? BlockFree, sid? , pid? wg1,Base, ips? []
1761312093219,Y GoLog: Y alg.go:1911>doh.go:790>doh.go:767>alg.go:1975>alg.go:1943>proxies.go:497>proxies.go:621>proxies.go:633: proxy: pin: ok? true; 104.18.0.48:443 from wg1; err? <nil>
1761312093220,Y GoLog: Y alg.go:1911>doh.go:790>doh.go:767>alg.go:1975>alg.go:1943>proxies.go:497>proxies.go:621>proxies.go:633: proxy: pin: ok? true; 104.18.0.48:443 from wg1; err? <nil>
1761312093222,Y GoLog: Y async.go:58>transport.go:847>transport.go:801>transport.go:593>alg.go:913>alg.go:723>alg.go:712: alg: response for g.api.mega.co.nz by BlockFree[1410199] (q4? false / q6? true) realip; in cache? [] [until: 0ms] (or stale? [66.203.125.15 66.203.125.14 66.203.125.12 66.203.125.11 66.203.125.13 2a0b:e4
Y 6:1:100::14 2a0b:e46:1:100::13 2a0b:e46:1:100::15 2a0b:e46:1:100::12 2a0b:e46:1:100::11])
1761312093224,V GoLog: V wgproxy.go:311>wgproxy.go:273>send.go:95>send.go:345>send.go:133>peer.go:136>wgconn.go:492: wg: bind: send: wg1 addr(149.88.103.161:51820) parcels(1) tx(148) (exp? false / flood? false / overw? false); err? <nil>
1761312093224,D GoLog: D wgproxy.go:314: spammy... 33 msgs; dropped? true
1761312093224,Y GoLog: Y alg.go:1975>alg.go:1943>proxies.go:497>proxies.go:621>proxies.go:628>proxy.go:649>wgproxy.go:1074>stats.go:124: wg: ReadStats: LatestRecentHandshake: 2m 1s, Peers: 1
1761312093224,Y GoLog: Y alg.go:1911>doh.go:790>doh.go:767>alg.go:1975>alg.go:1943>proxies.go:497>proxies.go:621>proxies.go:633: proxy: pin: ok? true; 104.18.0.48:443 from wg1; err? <nil>
1761312093224,V GoLog: V wgnet.go:126>wgnet.go:54>wgnet.go:72>dns.go:33>ipmap.go:262>ipmapper.go:98>ipmapper.go:127: ipmapper: lookup: host ip:sky.rethinkdns.com for -1 on [wg1]
1761312093224,D GoLog: D transport.go:1597>transport.go:1615>transport.go:1780>transport.go:1285>wgproxy.go:184>wgproxy.go:1113: wg: wg1 (novia/zz) dial: start tcp sky.rethinkdns.com:443
1761312093224,V GoLog: V wgnet.go:126>wgnet.go:54>wgnet.go:72>dns.go:33>ipmap.go:262>ipmapper.go:98>ipmapper.go:127: ipmapper: lookup: host ip:sky.rethinkdns.com for -1 on [wg1]
1761312093225,V GoLog: V wgnet.go:126>wgnet.go:54>wgnet.go:72>dns.go:33>ipmap.go:262>ipmapper.go:98>ipmapper.go:127: ipmapper: lookup: host ip:sky.rethinkdns.com for -1 on [wg1]

Not sure what you make of it, if anything stands out but I am semi tempted to give the paid plan a go just to see if it changes things. But it does seem weird that it WOULD change things because the free plan is of course a totally legitimate service they're offering, same as the free app. I sent the full logs both during an active connection and during a downtime in the first issue comment if you want to scan through them.

NlGHT avatar Nov 02 '25 14:11 NlGHT

configs from Proton they actually don't come with a Persistent Keep Alive value at all, as in not even mentioned

I know, but the Proton app for Android (from the code I remember seeing) force set Persistent KeepAlive to 60s.

Maybe this would somewhat improve things.

In v055u, we do this when WireGuard is either Failing or Waiting and user taps on the Refresh icon in Configure -> Proxy or when Rethink detects network changes.

Not sure what you make of it,

The logs end abruptly, but the lines you have shared show no signs of error. What you could do if you suspect DNS failures over WireGuard is to turn ON Configure -> DNS -> Never proxy DNS & see if things improve?

ignoramous avatar Nov 02 '25 21:11 ignoramous

So somewhat unfortunately, but also pretty fortunate for you guys, I started paying and the problems all went away!

My guess is that the problem potentially lies in the fact that my same device had multiple connections and proton just doesn't allow it for the free versions. While still on the free version I was able to find one way to restore connections in emergencies that did work about 20% of the time and that was to manually switch over from one wireguard profile to the next (as opposed to letting the advanced page handle that). This had to be kept minimal however, otherwise it would have no effect. So directly as a result I believe that might also answer the issue with the refresh not doing anything. Just proton specifically cutting off some free profiles when you reach a cap or something of instances.

Anyway I do appreciate your support testing and answering questions and at least I found the UI bug that got fixed! I'm sorry it took a long time for me to get back here. I had a lot going on and I was testing a bunch of things to try to find the culprit. Then ended up a couple weeks ago starting to pay, wanted to try for at least a little while to prove it wasn't a fluke. But then forgot about letting you know until now. My apologies!

NlGHT avatar Nov 30 '25 13:11 NlGHT