tortoise icon indicating copy to clipboard operation
tortoise copied to clipboard

Beam VM becomes stucked when the number of connections is high

Open pallix opened this issue 5 years ago • 5 comments

I have an application that connects to multiple MQTT servers, each running locally in its network namespaces. When the number of connection is too high the beam vm becomes stuck. I have written an example project by extracting code from our codebase and wrote a README to reproduce the problem. You can find it here. I would be happy if you can find the time to try it and tell me if you can reproduce the problem.

I am using Debian 9 (stretch), Elixir 1.8.2 and Erlang/OTP 21.3.8.8.

And thanks for writing this software :-).

P.S: I was once lucky enough to have the observer print a few more graphs before being completely overloaded:

Screenshot_2020-01-06_16-18-00

This is surprising because the rate of creation for the network namespace and start of the mosquitto servers is approx one per second.

pallix avatar Jan 14 '20 15:01 pallix

I am currently in the process of a major rewrite (admittedly it has been going on for a long while), which will bring MQTT 5 support to Tortoise. I hace recently picked up development again, and wrapping my head around what is needed to make it a release candidate, but the architecture differ, so I hope you will let me release that, and then loom at this issue ?

And thanks for using Tortoise; spawning 300 tortoises on a single node is not a use-case I anticipated :)

gausby avatar Jan 15 '20 09:01 gausby

It's great to hear that you are planning to further develop Tortoise! Maybe the problem will go away after the rewrite?

Thanks, I will keep an eye on the project development.

And thanks for using Tortoise; spawning 300 tortoises on a single node is not a use-case I anticipated :)

It sounds a big unusual but that's what is need to simulate IoT devices for my team.

pallix avatar Jan 15 '20 09:01 pallix

Forgot this information: most of the schedulers processes states were in the same calls when generated a dump (from the original problem, I did not generate a dump of the example):

Current Process CP: 0x00007f26a080db08 ('Elixir.Registry':unregister_match/4 + 952)
Current Process Limited Stack Trace:
0x00007f260d5c9348:SReturn addr 0x15A873D8 ('Elixir.Tortoise.Events':unregister/2 + 152)
0x00007f260d5c93a0:SReturn addr 0x15A79DF0 ('Elixir.Tortoise.Connection':connection/2 + 952)
0x00007f260d5c93a8:SReturn addr 0x15A8FCD8 ('Elixir.Tortoise':publish/4 + 384)
[...]

pallix avatar Jan 16 '20 09:01 pallix

Oh; I have a registry I use as a pubsub, such that processes can subscribe to a tcp socket—I move the tcp socket to the process that will send a QoS=0 message. Could be because the registry gets overwhelmed when too many tortoises are running.

…an interesting case. I will look further into this at a later time.

gausby avatar Jan 16 '20 11:01 gausby

Maybe related to https://github.com/pallix/veth_network_namespaces_perf

pallix avatar Jan 22 '20 13:01 pallix