Issues icon indicating copy to clipboard operation
Issues copied to clipboard

The SFTP step on a SSH connection does not have a timeout.

Open Justin-Walsh opened this issue 4 years ago • 5 comments

Prerequisites

  • [ ] I have verified the problem exists in the latest version
  • [x] I have searched open and closed issues to make sure it isn't already reported
  • [x] I have written a descriptive issue title
  • [x] I have linked the original source of this report
  • [x] I have tagged the issue appropriately (area/*, kind/bug, tag/regression?)

The bug

After making a successful SSH connection, the subseqent SFTP connection does not appear to have a timeout. If the SFTP session hangs or another connection interruption occurs, this can lead to the task blocking subsequent deployments while it waits indefinitely to complete.

What I expected to happen

SFTP connections should time out after a sensible amount of time.

Affected versions

Octopus Server: (At least) 2019.9.10 ->

Workarounds

  • Monitor task queue and cancel long-running deployments/healthchecks where appropriate.
  • Make use of https://github.com/OctopusDeploy/OctopusDeploy-Api/blob/master/REST/PowerShell/Deployments/CancelLongRunningTasks.ps1 to scan and stop long-running tasks.
  • (2022-11-08) Kill the SFTP process on the deployment target (tentacle)
    • On a Linux tentacle connecting via ssh to the tentacle and running kill -9 pid with any sftp process ids found via ps aux
    • On a windows tentacle RDP into the machine and restart the SFTP process
    • Or restart the host machine the SFTP process is running on

Links

Initial report: https://help.octopus.com/t/unhealthy-linux-target-blocking-global-heath-check/25296

Justin-Walsh avatar Jun 23 '20 14:06 Justin-Walsh

Additional affected customer: https://secure.helpscout.net/conversation/1202457431/66524?folderId=3767295 (internal only)

This customer is also reporting that they must restart the Octopus service in order to cancel a task affected by this bug.

donnybell avatar Jun 24 '20 21:06 donnybell

One more affected: https://secure.helpscout.net/conversation/1302390635/71528/

Also reporting they must restart the tentacle.

donnybell avatar Oct 06 '20 18:10 donnybell

Additional Report: https://octopus.zendesk.com/agent/tickets/85960 [Internal link]

Justin-Walsh avatar Mar 23 '22 14:03 Justin-Walsh

Additional internal report: https://octopusdeploy.slack.com/archives/C01HZFJRYSH/p1667864156526899 [Internal link]

nathanwoctopusdeploy avatar Nov 08 '22 02:11 nathanwoctopusdeploy

Canceling the task can result in the server task getting stuck in the Cancelling state until the sftp process on the tentacle is killed so the provided workaround in this issue may not be correct anymore - updated the workarounds to include killing the SFTP process on the tentacle

nathanwoctopusdeploy avatar Nov 08 '22 02:11 nathanwoctopusdeploy