fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Use SO_REUSEADDR in flb_net_server

Open cratuki opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Background. I am encapsulating an instance of fluentbit within another process. I am using the tcp input server for this fluentbit instance.

Design of this interaction. Parent and fluentbit instances both run in the same Linux Session (this is similar to a Process Group). This means that when the parent process exits, the operating system cleans up the fluentbit process. (This happens for the same reason that - if you were ssh'd into a linux host and then closed that ssh session, any background processes in that Session would be cleaned up.)

My problem. When I restart my parent process quickly, the new fluentbit instance can't restart its server. I believe this is because the port is still being held open in a TIME_WAIT state as a consequence of the previous process not having been fully cleaned up yet.

This is a not-obscure situation with network programming. I have had success at overcoming it in the past through use of a socket option, SO_REUSEADDR. If you specify SO_REUSEADDR it tells the operating system that it can reclaim a port that is held in TIME_WAIT state for the process that is requesting it.

Describe the solution you'd like

Please add the tcp option SO_REUSEADDR to the server launch code in flb_net_server in flb_network.c on the line after the call to flb_net_socket_create.

There are examples of this usage elsewhere in the codebase. For example, line 2693 of lib/monkey/mk_core/deps/libevent/evutil.c.

Describe alternatives you've considered

Alternate option: you could specify SO_REUSEADDR in flb_net_socket_create.

I can probably fix this by adding arbitrary sleeps to my container, but this is a workaround, less elegant than a SO_REUSEADDR-oriented fix.

cratuki avatar Jun 10 '22 08:06 cratuki

Can you submit a PR for this?

patrick-stephens avatar Jun 20 '22 15:06 patrick-stephens

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Sep 19 '22 02:09 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Sep 25 '22 02:09 github-actions[bot]