scout_apm_python
scout_apm_python copied to clipboard
Understanding socket connection issues
What
We'd like to track socket connection issues with the core agent so we can identify issues and fix them proactively.
Why
The socket connection might fail for multiple reasons, which might include
- Unable to create the
core_agent_dir
- Unable to download (e.g outbound firewall prevents it)
- Permission denied to an already existing
core_agent_socket
path -
noexec
is set on the volume wherecore_agent_dir
resides - Random things like
core_agent_dir
resides on a bind mount inside a docker container on an OSX host and creating a socket results in aname too long
error. https://github.com/moby/moby/issues/23545#issuecomment-226144475
No matter the issue, the end result is that the python agent will attempt to connect to the socket, fail, and ultimately stop trying after a few attempts. We'll start by tracking when the connection attempts are exhausted.
How
- At https://github.com/scoutapp/scout_apm_python/blob/master/src/scout_apm/core/socket.py#L190
- Report the following data in a
POST
tohttps://checkin.scoutapp.com/apps/diagnostics.scout
- With URL parameters:
?key=#{org_key}&name=#{app_name}
- With Headers:
Agent-Hostname: #{socket.gethostbyname()}
Agent-Version: #{agent version}
Content-Type: application/octet-stream
- With body data:
{
version: 1,
type: socket,
language: python,
agent_version: ${agent_version},
environment: #{app_environment},
node: #{socket.gethostbyname()},
time_since_startup: #{figure out how to determine this},
agent_config: #{json object/hash of key/value of the agent's config options}
}
👍
The current code setup means that the CoreAgentSocketThread
won't be involved in any of the errors you listed there, so it's not the right place to add this logging.
The right place is inside CoreAgentManager
.
#482 tracks moving the download under the thread, to avoid blocking application startup. I will look into that as a follow-up of this.