scout_apm_python icon indicating copy to clipboard operation
scout_apm_python copied to clipboard

Understanding socket connection issues

Open dlanderson opened this issue 4 years ago • 2 comments

What

We'd like to track socket connection issues with the core agent so we can identify issues and fix them proactively.

Why

The socket connection might fail for multiple reasons, which might include

  • Unable to create the core_agent_dir
  • Unable to download (e.g outbound firewall prevents it)
  • Permission denied to an already existing core_agent_socket path
  • noexec is set on the volume where core_agent_dir resides
  • Random things like core_agent_dir resides on a bind mount inside a docker container on an OSX host and creating a socket results in a name too long error. https://github.com/moby/moby/issues/23545#issuecomment-226144475

No matter the issue, the end result is that the python agent will attempt to connect to the socket, fail, and ultimately stop trying after a few attempts. We'll start by tracking when the connection attempts are exhausted.

How

  1. At https://github.com/scoutapp/scout_apm_python/blob/master/src/scout_apm/core/socket.py#L190
  2. Report the following data in a POST to https://checkin.scoutapp.com/apps/diagnostics.scout
  3. With URL parameters: ?key=#{org_key}&name=#{app_name}
  4. With Headers:
Agent-Hostname: #{socket.gethostbyname()}
Agent-Version: #{agent version}
Content-Type: application/octet-stream
  1. With body data:
{
  version: 1,
  type: socket,
  language: python,
  agent_version: ${agent_version},
  environment: #{app_environment},
  node: #{socket.gethostbyname()},
  time_since_startup: #{figure out how to determine this},
  agent_config: #{json object/hash of key/value of the agent's config options}
}

dlanderson avatar Aug 26 '20 20:08 dlanderson

👍

adamchainz avatar Aug 26 '20 23:08 adamchainz

The current code setup means that the CoreAgentSocketThread won't be involved in any of the errors you listed there, so it's not the right place to add this logging.

The right place is inside CoreAgentManager.

#482 tracks moving the download under the thread, to avoid blocking application startup. I will look into that as a follow-up of this.

adamchainz avatar Sep 01 '20 10:09 adamchainz