Handle things gracefully if Honeycomb or other otel API services reject an event
Matt from trigger.dev reported this happened to them when they connected an electric instance to Honeycomb on a free plan — they quickly hit the limit and when Honeycomb blocked events, Electric crashed.
@alco can you check if this is still an issue and close it?
Checking the latest main, it definitely is more spammy than we would like to see:
12:28:26.605 pid=<0.2859.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:26.606 pid=<0.2854.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:27.610 pid=<0.2854.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:27.610 pid=<0.2859.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:28.613 pid=<0.2854.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:28.613 pid=<0.2859.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:29.617 pid=<0.2854.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:29.618 pid=<0.2859.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:30.621 pid=<0.2854.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:30.621 pid=<0.2859.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:31.626 pid=<0.2854.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:31.626 pid=<0.2859.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:32.546 pid=<0.2859.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:32.546 pid=<0.2854.0> [warning] Got connection/transport error when sending metrics %Mint.TransportError{reason: :econnrefused}, retrying...
12:28:32.562 pid=<0.2859.0> [error] GenServer :otel_metric_exporter terminating
** (CaseClauseError) no case clause matching: {:retry, %Mint.TransportError{reason: :econnrefused}}
(otel_metric_exporter 0.3.6) lib/otel_metric_exporter/metric_store.ex:221: OtelMetricExporter.MetricStore.export_metrics/1
(otel_metric_exporter 0.3.6) lib/otel_metric_exporter/metric_store.ex:168: OtelMetricExporter.MetricStore.handle_info/2
(stdlib 6.2) gen_server.erl:2345: :gen_server.try_handle_info/3
(stdlib 6.2) gen_server.erl:2433: :gen_server.handle_msg/6
(stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Last message: :export
[...]
The error comes from elixir-otel-metric-exporter. I've opened a PR on that project - https://github.com/electric-sql/elixir-otel-metric-exporter/pull/9.
Testing Electric with telemetry export enabled but no connectivity to the telemetry endpoint yields expected error logs and no disruption of service.
I wanted to test what happens when simulating error responses from Honeycomb but couldn't set up an HTTP proxy for the telemetry export client in Elixir. Will try again sometime.
While looking into this issue, I've made some code changes here:
- https://github.com/electric-sql/electric/pull/2677
- https://github.com/electric-sql/electric/pull/2681