electric icon indicating copy to clipboard operation
electric copied to clipboard

In-flight requests crash when database is deleted

Open kevin-dp opened this issue 9 months ago • 1 comments

When Electric has one or more in-flight requests (e.g. a live request) for a given database and the database gets deleted while the requests are still in flight then the request returns a 500:

{
  "error": "** (ArgumentError) errors were found at the given arguments:\n\n  * 1st argument: the table identifier does not refer to an existing ETS table\n"
}

And the full error from the Electric logs:

10:32:26.777 pid=<0.797.0> request_id=GDRKu_EjNPhbQdkAAAIE [info] Sent 500 in 20013ms

10:32:26.780 pid=<0.797.0> request_id=GDRKu_EjNPhbQdkAAAIE [error] ** (Plug.Conn.WrapperError) ** (ArgumentError) errors were found at the given arguments:

  * 1st argument: the table identifier does not refer to an existing ETS table

    (stdlib 6.2) :ets.lookup(:"single_stack:lsn_tracker", :last_processed_lsn)
    (electric 1.0.5) lib/electric/lsn_tracker.ex:26: Electric.LsnTracker.get_last_processed_lsn/1
    (electric 1.0.5) lib/electric/shapes/api.ex:571: Electric.Shapes.Api.get_global_last_seen_lsn/1
    (electric 1.0.5) lib/electric/shapes/api.ex:371: Electric.Shapes.Api.determine_global_last_seen_lsn/1
    (electric 1.0.5) lib/electric/shapes/api.ex:545: Electric.Shapes.Api.hold_until_change/1
    (electric 1.0.5) lib/electric/shapes/api.ex:414: anonymous fn/1 in Electric.Shapes.Api.serve_shape_log/1
    (opentelemetry_api 1.4.0) src/otel_tracer_noop.erl:59: :otel_tracer_noop.with_span/5
    (electric 1.0.5) lib/electric/telemetry/open_telemetry.ex:87: anonymous fn/3 in Electric.Telemetry.OpenTelemetry.do_with_span/4
    (telemetry 1.3.0) /Users/kevin/Documents/Electric/development/electric/packages/sync-service/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
    (electric 1.0.5) lib/electric/shapes/api.ex:427: Electric.Shapes.Api.serve_shape_log/2
    (electric 1.0.5) lib/electric/plug/serve_shape_plug.ex:1: Electric.Plug.ServeShapePlug.plug_builder_call/2
    (electric 1.0.5) deps/plug/lib/plug/error_handler.ex:80: Electric.Plug.ServeShapePlug.call/2
    (electric 1.0.5) deps/plug/lib/plug/router.ex:246: anonymous fn/4 in Electric.Plug.Router.dispatch/2
    (telemetry 1.3.0) /Users/kevin/Documents/Electric/development/electric/packages/sync-service/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
    (electric 1.0.5) deps/plug/lib/plug/router.ex:242: Electric.Plug.Router.dispatch/2
    (electric 1.0.5) lib/electric/plug/router.ex:1: Electric.Plug.Router.plug_builder_call/2
    (bandit 1.5.5) lib/bandit/pipeline.ex:124: Bandit.Pipeline.call_plug!/2
    (bandit 1.5.5) lib/bandit/pipeline.ex:36: Bandit.Pipeline.run/4
    (bandit 1.5.5) lib/bandit/http1/handler.ex:12: Bandit.HTTP1.Handler.handle_data/3
    (bandit 1.5.5) lib/bandit/delegating_handler.ex:18: Bandit.DelegatingHandler.handle_data/3
    (bandit 1.5.5) /Users/kevin/Documents/Electric/development/electric/packages/sync-service/deps/thousand_island/lib/thousand_island/handler.ex:411: Bandit.DelegatingHandler.handle_continue/2
    (stdlib 6.2) gen_server.erl:2335: :gen_server.try_handle_continue/3
    (stdlib 6.2) gen_server.erl:2244: :gen_server.loop/7
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3

When shutting down a stack, we should cancel all in-flight requests, stop accepting requests for that stack, and only then shutdown the stack.

kevin-dp avatar Apr 08 '25 08:04 kevin-dp

Related to: https://github.com/electric-sql/electric/issues/2529

kevin-dp avatar Apr 08 '25 08:04 kevin-dp

Opened https://github.com/electric-sql/electric/pull/2712 to address this - the root cause is that after the long poll timeout the request is being processed as if the stack is "active" and able to handle operations, but in cases like the one described it has fallen over. I've addressed this by essentially ensuring the stack is up after a long poll timeout and otherwise returning the appropriate error.

msfstef avatar May 13 '25 12:05 msfstef