securedrop icon indicating copy to clipboard operation
securedrop copied to clipboard

Explore options for detecting and recovering from Tor-related downtime

Open nathandyer opened this issue 4 months ago • 4 comments

For some time now, we have been seeing intermittent availability issues with some SecureDrops due to what appears to be connection hiccups with the Tor network. In most cases, these issues fix themselves the following day after a scheduled nightly reboot, but they could potentially be fixed much quicker.

It would be nice if there could be an internal mechanism within SecureDrop (periodic network checks, etc.) that would be able to spot whenever there is a connectivity issue, and attempt to restart the Tor service automatically.

We should research potential options for adding this functionality into SecureDrop.

nathandyer avatar Aug 27 '25 19:08 nathandyer

I'd like to bump this issue. There are many network/Tor issues that cause a SecureDrop to go offline, and stay offline until an automatic or manual reboot. This results in a surprising amount of downtime for SecureDrop instances, to the point where it's likely potential sources are trying to submit something and find their desired Source Interface is not available.

ChumOfChance avatar Nov 21 '25 15:11 ChumOfChance

Love this as a priority. Some questions:

  1. Are we able to replicate this?
  2. Do we have logging on the connectivity issue? (Is it Proxy Server Is Refusing Connection for example?)
  3. How will we (or, will we) prevent a restart loop if the connection stays down for a while?

jskinne3 avatar Nov 21 '25 22:11 jskinne3

It's difficult to replicate, because generally the cause is somewhat nebulous Tor network issues. We do have a few decent "case studies" with logs + observations. Close monitoring of a test instance might yield better correlation or data. This issue is one example, we think: https://gitlab.torproject.org/tpo/core/tor/-/issues/40739

I imagine there could be a few ways to prevent a restart loop. Some internal tracking of when the last restart was.

In some cases we have found that clearing the Tor state can help with certain Tor connectivity issues (sometimes in lieu of a full restart), but doing this has security tradeoffs for a Tor service. It requires some investigation.

ChumOfChance avatar Nov 24 '25 19:11 ChumOfChance

Let's be sure to track this code change if that "compression bomb" case study is indeed a problem for us

jskinne3 avatar Nov 26 '25 01:11 jskinne3