desktop icon indicating copy to clipboard operation
desktop copied to clipboard

[Bug]: DNS time-to-live not respected

Open barryprice opened this issue 9 months ago • 4 comments

Before you file a bug report

  • [X] I have checked the issue tracker and have not found an issue that matches the one I'm filing.
  • [X] This issue is not a troubleshooting question. Troubleshooting questions go here: https://forum.mattermost.com/c/trouble-shoot/16.
  • [X] This issue is not a feature request. You can request features and make product suggestions here: https://mattermost.com/suggestions/.
  • [X] This issue doesn't reproduce on web browsers (such as in Chrome). If it does, issue reports go to the Mattermost Server repository.
  • [X] This issue reproduces on the most recent stable version, or the most recent prerelease version of the Mattermost Desktop App.
  • [X] I have read the contribution guidelines.

Mattermost Desktop Version

5.5.0 commit: 4f266a3

Operating System

Ubuntu Linux 22.04 LTS x64 (but seen across various series)

Mattermost Server Version

7.8.0

Steps to reproduce

We migrated a production Mattermost server instance between data centres earlier today.

During the downtime period we intentionally took both source and target instances offline (in such a way that users would receive a 503 error) to avoid skew between the two installations during the sync.

Prior to and during this period, DNS TTL was reduce to 60s.

Once migration was complete, we restored service on the target instance but intentionally kept the source instance offline.

Connecting to the migrated service via web browser (as well as e.g. matterircd) worked fine at this point, but trying to use an already-running mattermost-desktop app just showed 503 errors, confirmed by several users.

We tried logging out and logging in again, but this didn't make any difference, further investigation revealed it was still trying to connect to the intentionally-down source service.

It appears that the app does a DNS lookup on startup/login and then caches that result for far longer than expected, possibly indefinitely.

Fully stopping and relaunching the app resolved the problem.

Expected behavior

The app should have noticed that the target IP changed, and attempted to reconnect to the new target.

If not immediately, then certainly at the logout/login step.

Observed behavior

The stale IP from the source service was cached for many times longer than the set TTL, while the local machine's resolver was well aware of the new one.

Log Output

main.log shows nothing relevant from before the restart that fixed the issue, just repeats of this:

[2023-09-27 08:23:21.408] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 08:24:35.590] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 08:49:52.024] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 08:50:56.047] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 08:54:47.388] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 08:55:24.986] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:41:03.193] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:41:43.804] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:42:06.971] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:43:04.916] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:43:11.380] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:43:42.473] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:46:17.176] [info]  [App.Config] config.autostart has been configured: false
[2023-09-27 09:47:08.429] [info]  [App.Config] config.autostart has been configured: false

Additional Information

No response

barryprice avatar Sep 27 '23 08:09 barryprice

@barryprice This seems like it might be an issue with Electron/Chromium, since it's the one managing the DNS lookups. Is this reproducible in the browser?

devinbinnie avatar Oct 02 '23 13:10 devinbinnie

That was my assumption, but I am not at all familiar with Electron - so I'm unsure whether it's something that can be potentially tweaked via options in the mattermost-desktop build, or whether this needs to be targeted upstream to Electron itself (or some component(s) thereof).

Several users tested with various browsers while we were seeing this issue with the app (Firefox, standalone Chrome/Chromium), and all reported that the DNS TTL was respected in those cases with no issues.

barryprice avatar Oct 03 '23 01:10 barryprice

I would wager this would be something we'd want to file upstream to Electron. I can try and reproduce the issue using Electron Fiddle, but it will be tough and take some time. Let me get back to you.

devinbinnie avatar Oct 03 '23 13:10 devinbinnie

@barryprice I actually was able to reproduce this on Chrome myself on macOS with a CNAME entry with a TTL of 60s. It took a restart of the app to get it to update, not even a Hard Reload would jig it.

This seems like an issue with Chromium itself.

devinbinnie avatar Oct 04 '23 14:10 devinbinnie