elsa icon indicating copy to clipboard operation
elsa copied to clipboard

Termination due to timeout during high throughput

Open m0rt3nlund opened this issue 4 years ago • 3 comments

Hi!

I posted this question in the repo for off_broadway_kafka also, but i think this is the most correct place for it:

I have a problem when there is high throughput. I get this error message and I cannot figure out why it times out. Any suggestions on how I can continue my investigation. Any other info I can provide?

2020-03-05 - 21:36:40.296: [error] GenServer #PID<0.2926.0> terminating ** (stop) exited in: GenServer.call({:via, Elsa.Registry, {:elsa_registry_per_partition, Elsa.Group.Manager}}, {:process_assignments, "nonode@nohost/<0.2926.0>-3c16cb43-aacf-4924-90ac-1ed0bab18fcc", 872, [{:brod_received_assignment, "test_plant_values", 0, 10688}, {:brod_received_assignment, "test_plant_values", 1, 11088}, {:brod_received_assignment, "test_plant_values", 2, 10966}, {:brod_received_assignment, "test_plant_values", 3, 10627}, {:brod_received_assignment, "test_plant_values", 4, 10973}, {:brod_received_assignment, "test_plant_values", 5, 11055}, {:brod_received_assignment, "test_plant_values", 6, 10864}, {:brod_received_assignment, "test_plant_values", 7, 11070}, {:brod_received_assignment, "test_plant_values", 8, 10855}, {:brod_received_assignment, "test_plant_values", 9, 10835}, {:brod_received_assignment, "test_plant_values", 10, 10985}, {:brod_received_assignment, "test_plant_values", 11, 10932}, {:brod_received_assignment, "test_plant_values", 12, 10957}, {:brod_received_assignment, "test_plant_values", 13, 11001}, {:brod_received_assignment, "test_plant_values", 14, 11019}, {:brod_received_assignment, "test_plant_values", 15, 10988}, {:brod_received_assignment, "test_plant_values", 16, 10948}, {:brod_received_assignment, "test_plant_values", 17, 10784}, {:brod_received_assignment, "test_plant_values", 18, 10868}, {:brod_received_assignment, "test_plant_values", 19, 10780}, {:brod_received_assignment, "test_plant_values", 20, 10922}, {:brod_received_assignment, "test_plant_values", 21, 10972}, {:brod_received_assignment, "test_plant_values", 22, 10885}, {:brod_received_assignment, "test_plant_values", 23, 10986}, {:brod_received_assignment, "test_plant_values", 24, 11006}, {:brod_received_assignment, "test_plant_values", 25, 10999}, {:brod_received_assignment, "test_plant_values", 26, 10952}, {:brod_received_assignment, "test_plant_values", 27, 10909}, {:brod_received_assignment, "test_plant_values", 28, 10850}, {:brod_received_assignment, "test_plant_values", 29, 11028}, {:brod_received_assignment, "test_plant_values", 30, 11081}, {:brod_received_assignment, "test_plant_values", 31, 10812}, {:brod_received_assignment, "test_plant_values", 32, 10819}, {:brod_received_assignment, "test_plant_values", 33, 10848}, {:brod_received_assignment, "test_plant_values", 34, 11004}, {:brod_received_assignment, "test_plant_values", 35, 10969}, {:brod_received_assignment, "test_plant_values", 36, 10847}, {:brod_received_assignment, "test_plant_values", 37, 10973}, {:brod_received_assignment, "test_plant_values", 38, 10873}, {:brod_received_assignment, "test_plant_values", 39, 10965}, {:brod_received_assignment, "test_plant_values", 40, 10954}, {:brod_received_assignment, "test_plant_values", 41, 11085}, {:brod_received_assignment, "test_plant_values", 42, ...}, {:brod_received_assignment, "test_plant_values", ...}, {:brod_received_assignment, ...}, {...}, ...]}, 5000) ** (EXIT) time out (elixir) lib/gen_server.ex:1009: GenServer.call/3 (brod) src/brod_group_coordinator.erl:624: :brod_group_coordinator.sync_group/1 (brod) src/brod_group_coordinator.erl:530: :brod_group_coordinator.do_stabilize/3 (brod) src/brod_group_coordinator.erl:531: :brod_group_coordinator.do_stabilize/3 (brod) src/brod_group_coordinator.erl:343: :brod_group_coordinator.handle_info/2 (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4 (stdlib) gen_server.erl:711: :gen_server.handle_msg/6 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 Last message: {:lo_cmd_stabilize, 0, :undefined} State: {:state, :per_partition, "test_datasystem", "", :undefined, 0, ["test_plant_values"], :undefined, :undefined, [], false, {:via, Elsa.Registry, {:elsa_registry_per_partition, Elsa.Group.Manager}}, Elsa.Group.Manager, [], :undefined, :roundrobin_v2, 30, 5, 5, 1, :undefined, :commit_to_kafka_v2, 5, :roundrobin_v2}

m0rt3nlund avatar Mar 05 '20 21:03 m0rt3nlund

Currently running version 0.11.1

m0rt3nlund avatar Mar 06 '20 07:03 m0rt3nlund

Upgraded to latest, 0.12.1 and now the problem changed. Appears to be a problem with off_broadway_kafka now.

2020-03-06 - 10:32:18.285: [error] ** (exit) exited in: GenServer.call(#PID<0.7699.0>, {:ack, %{generation_id: 896, partition: 144, pid: #PID<0.7699.0>, topic: "test_plant_values"}, [14227, 14228, 14229, 14230, 14231, 14232, 14233, 14234, 14235, 14236, 14237, 14238, 14239, 14240, 14241, 14242, 14243, 14244, 14245, 14246]}, 5000)
    ** (EXIT) time out
    (elixir) lib/gen_server.ex:1009: GenServer.call/3
    (elixir) lib/enum.ex:789: anonymous fn/3 in Enum.each/2
    (stdlib) maps.erl:232: :maps.fold_1/3
    (elixir) lib/enum.ex:1964: Enum.each/2
    (broadway) lib/broadway/consumer.ex:64: Broadway.Consumer.handle_events/3
    (gen_stage) lib/gen_stage.ex:2395: GenStage.consumer_dispatch/6
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6

m0rt3nlund avatar Mar 06 '20 10:03 m0rt3nlund

Sorry for the delay on getting back @m0rt3nlund but we're taking a look at and trying to knock out all of the open issues with Elsa at the moment. Is this still and issue and if it is, A) what is the current version of Elsa you're using (there have been a few reasonably large changes since early Mar) and B) if yes and you've validated it on a newer version, could you give us some more detail about you environment and app configuration for context?

jeffgrunewald avatar Jun 23 '20 13:06 jeffgrunewald