elixir_agent
elixir_agent copied to clipboard
Sample processing can fail when sampling data
Hello! 👋
Describe the bug
We got two New Relic errors with message (ArithmeticError) bad argument in arithmetic expression
. The stack trace pointed to https://github.com/newrelic/elixir_agent/blob/master/lib/new_relic/sampler/process.ex#L79
It looks like the agent does not handle a nil
return case when fetching info about the process if the process is no longer alive. According to erlang docs, "Returns undefined [nil in Process.info/2] if the process is not alive."
I haven't found details on why the process wasn't alive. Both error occurrences were a single error on different days.
Environment
- Elixir & Erlang version: 1.12.1-otp-24
- Agent version: 1.27.4
The GenServer tries to handle this by putting a monitor
on the process and handling the case when the process goes down: https://github.com/newrelic/elixir_agent/blob/master/lib/new_relic/sampler/process.ex#L34
That said there's probably some kind of race condition here, maybe the process dies before the first sample is even taken.. Should be possible to handle the nil
case with a little refactor. PR welcome :)