netbird
netbird copied to clipboard
Management Server is limited to 100 simultaneous peers
Describe the problem
After 100 peers have connected the management server stops letting more connect.
To Reproduce
Steps to reproduce the behavior:
- Self-Host
- Permit more than 100 peers
- Have them all connect
- after 100 no more get a connection
Expected behavior
A more realistic limit
Are you using NetBird Cloud?
Self-Hosted
NetBird version
0.27.1 (management)
Additional context
For any user beyond the limit, the management log shows these WARN entries:
"{"log":"2024-04-10T12:50:02Z WARN management/server/grpcserver.go:376: failed logging in peer Ij6aLkfZU7qzUOgfzTAZMaaLNAOGsow2SDmdP+8Rxig=\n","stream
":"stderr","time":"2024-04-10T12:50:02.614030981Z"}"
While chatting in the slack my colleague found this reference, which seems that it may be related. Can this be made to a customize-able setting that defaults to 100 if not otherwise specified?
https://github.com/netbirdio/netbird/blob/3ed2f08f3c5dd930a598a26f24cf028807816486/management/server/updatechannel.go#L13
const channelBufferSize = 100
https://github.com/netbirdio/netbird/blob/main/management/server/updatechannel.go#L83-L85
// mbragin: todo shouldn't it be more? or configurable? channel := make(chan *UpdateMessage, channelBufferSize) p.peerChannels[peerID] = channel
Hello!
Could you set trace log level for the server and collect the relevant part of the new logs? You can set it with the "--log-level" , "trace"
command parameters.
I'd be happy to provide whatever is needed. I'm not sure exactly what the "relevant part" is. I gathered trace logs for a few minutes while this is happening and wasn't exactly sure which entries are most relevant for you.
Here's a clump of the log - is that useful?
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer 4/JzpIqInXK1wdadB5F7rVi0u6n6IehsxvapPsi74kw=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:196: sent an update to peer 4/JzpIqInXK1wdadB5F7rVi0u6n6IehsxvapPsi74kw=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co1e0fs11epihjcae740 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer co2264s11epihjcae7m0
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer f0qx2mNSUY4hwXaVXq7T1xz7SCCl73VEkk/xBEu0+GU=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:196: sent an update to peer f0qx2mNSUY4hwXaVXq7T1xz7SCCl73VEkk/xBEu0+GU=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co63ct411epm9iuvd780 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer cmt9dtf13t6cu7gofra0
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer cy8snf31k/GUmiJmSjCqq5OHj6UxktQZuh0ah+crJzg=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:196: sent an update to peer cy8snf31k/GUmiJmSjCqq5OHj6UxktQZuh0ah+crJzg=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer coak74411epmg8a5ovb0 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co620lc11epm9iuvd770 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co2t5i411epihjcugt0g has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co27f1k11epihjcae870 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co1qjgs11epihjcae7ag has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co234lk11epihjcae7og has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer co2u1j411epihjcugt1g
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer gkbQFko845iP9WIyYdbYyJXq93dXsM1mCRc+HQbtMw0=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:196: sent an update to peer gkbQFko845iP9WIyYdbYyJXq93dXsM1mCRc+HQbtMw0=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer co3g7ac11epm9isopqu0
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer tZ1YZ0TO8AZOLcyIQpb+FPIDOuVL74MH73TkP5pd+gs=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:196: sent an update to peer tZ1YZ0TO8AZOLcyIQpb+FPIDOuVL74MH73TkP5pd+gs=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co2keo411epihjcae8n0 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer cnuvfqk11epihjaflaag has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co244uk11epihjcae7q0 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer cnsqd5k11epihjafla0g
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer l9Lqrb955BGiSZQqx16pHeXc9SgJo4V8n0ZQu4rlDCA=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:196: sent an update to peer l9Lqrb955BGiSZQqx16pHeXc9SgJo4V8n0ZQu4rlDCA=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co7celk11epm9io1sa90 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer coakp7411epmg8a5ovbg has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer co2jsa411epihjcae8lg
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co29r3k11epihjcae8fg has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer cm3k4seabkf1d47d673g
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer k5wK0qIuOq0cdRsBPCmsnavyZkhrI0VnQuS8hEyqYis=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:196: sent an update to peer k5wK0qIuOq0cdRsBPCmsnavyZkhrI0VnQuS8hEyqYis=
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co4j9pc11epm9irg53d0 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co0msis11epihjcae6r0 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co22p3s11epihjcae7ng has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer cnga35713t6blui698dg has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer cnqbjs411eppa23f8ft0 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:60: peer co0tbf411epihjcae6u0 has no channel
management-1 | 2024-04-10T15:38:03Z DEBG management/server/updatechannel.go:54: update was sent to channel for peer cngcn4n13t6blui698k0
management-1 | 2024-04-10T15:38:03Z DEBG management/server/grpcserver.go:180: received an update for peer qu1GjuTffbayt2hPmwDAjqtoqxriSZpxjLTVhtVQSFk=
@pappz Same problem on 0.27.10. Changing const channelBufferSize to 1000 modifies maximum peers to 1000. This is strange, but works.
Hello @bravosierrasierra the buffer size is a queue that indicates the number of messages to send a specific peer in the netbird network.
So you can have hundreds of thousands of nodes and each one of them will have a max of 100 messages that can be queued.
It seems like you are facing another issue with the deployment which is causing these messages.
Can you share the all management logs? And confirm if you have JWT group sync enabled?
I confirm that I have encountered a similar problem. It is also described here https://github.com/netbirdio/netbird/issues/1782
After adding 101 participants to 1 group, errors occur and randomly 1 of the participants is not announced. Sometimes - this participant is a router. I corrected it https://github.com/netbirdio/netbird/blob/main/management/server/updatechannel.go#L13 increasing the value to 1000 and the problem is gone
Hi @akastav can you share if you have JWT group sync enabled?
Hi @akastav can you share if you have JWT group sync enabled?
yes, we use groups from JWT/Keycloak
Ok, this is probably the main issue. There is a bug to be fixed in our roadmap which causes lot's of group reconfiguration. You can check that by the number of duplicated group events you have in the activity view.
Ok, this is probably the main issue. There is a bug to be fixed in our roadmap which causes lot's of group reconfiguration. You can check that by the number of duplicated group events you have in the activity view.
But why increasing channelBufferSize solve problem? Does this bug about JWT groups from roadmap have a link we can follow?
@bravosierrasierra @akastav, what are your IDP providers?
@mlsmaycon we are both use different keycloak-s
Thanks @bravosierrasierra. Would it be possible for you to confirm the events in the activity tab? If you see duplicates, please share the JWT decoded data from one of the users affected. If you join our Slack channel, we can help you get that so you can also share in DM.
We are not seeing a storm of events after increasing channelBufferSize. Just rare messages about users connecting.
We found the root cause of the issue and we are working on a fix.
Hey folks, the PR has been merged and will be in our next release.
Hey folks, have you tested? Should we close this issue?
I'll be doing the upgrade this Saturday.
I finished my upgrades to 0.28.4 and re-enabled JWT sync. I can confirm that I'm not seeing multiple repeating group inserts for users any longer. I believe this can be closed now. Thanks!!
Back to a full working day with > 140 peers connected and the mgmt service is showing no signs of problems. (FYI, I did the migration to postgres too :-)
That's excellent; thanks, @TSJasonH, for double checking. I am closing this now.