rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

transports/tcp: Remove sleep_on_error

Open mxinden opened this issue 1 year ago • 7 comments

Description

The sleep_on_error mechanism in libp2p-tcp would delay the next poll on the listener stream when an error happens. This mechanism was introduced in https://github.com/libp2p/rust-libp2p/pull/402 based on the tk_listen crate also referenced in the tokio documentation.

When running out of file descriptors, the listening socket would return an EMFILE. Instead of polling the socket again, thus likely receiving another EMFILE, potentially resulting in a busy loop, one would instead wait for 100ms.

Modern operating systems should run with high file descriptor limits and thus this error should not happen in the wild. In addition, delaying the next poll only covers up the issue, but does not solve it.

Lastly rust-libp2p prioritizes incoming connections the lowest. Thus, while this could result in a busy loop, it only does in case there is no other work left.

With the above in mind, this pull request removes the daly.

(The mention of tk_listen has since been removed from tokio.)

Links to any relevant issues

  • Recent discussion on sleep_on_error https://github.com/libp2p/rust-libp2p/pull/2813#discussion_r950802863

Open Questions

  • What do folks think? Is it safe to assume that rust-libp2p runs in environments with high file descriptor limits?
  • How do other projects handle these cases? I have not found other examples yet.

Change checklist

  • [x] I have performed a self-review of my own code
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] A changelog entry has been made in the appropriate crates

mxinden avatar Aug 26 '22 08:08 mxinden

As far as I can tell, go-libp2p does not pause when witnessing an error on accept:

https://github.com/libp2p/go-libp2p/blob/423eab209791f1b1864096371c1b3d76a2bc88c3/p2p/transport/tcp/tcp.go#L79-L91

Maybe @marten-seemann @MarcoPolo or @julian88110 can confirm?

mxinden avatar Aug 26 '22 08:08 mxinden

No we don't wait.

marten-seemann avatar Aug 26 '22 08:08 marten-seemann

this error should not happen in the wild.

We've seen it a few times, for instance on macos or whatever

Thus, while this could result in a busy loop, it only does in case there is no other work left.

You assume this is the only program running on the server, imo it's never ok to burn a thread for no reason

In addition, delaying the next poll only covers up the issue, but does not solve it.

What would solving it look like? You either raise the fd limit, or lower the maximum number of peers

How do other projects handle these cases?

In nimbus, we show a warning at startup if the fd limit is too close / below the maximum peer amount (if we can, it's not straightforward to get the actual limit). Otherwise, we sleep after a failed accept to avoid the busy loop

Menduist avatar Aug 26 '22 09:08 Menduist

Thanks @marten-seemann and @Menduist.

Thus, while this could result in a busy loop, it only does in case there is no other work left.

You assume this is the only program running on the server, imo it's never ok to burn a thread for no reason

Good point.

In addition, delaying the next poll only covers up the issue, but does not solve it.

What would solving it look like? You either raise the fd limit, or lower the maximum number of peers

Those are the only solutions I can think of as well. My thought was that a busy loop would surface the issue (e.g. low file descriptor count), while the delay would cover the issue up when running in production. Unless one pays attention to the additional logged errors, which I doubt is happening in most setups.

How do other projects handle these cases?

In nimbus, we show a warning at startup if the fd limit is too close / below the maximum peer amount (if we can, it's not straightforward to get the actual limit). Otherwise, we sleep after a failed accept to avoid the busy loop

For the record, corresponding logic in Substrate: https://github.com/paritytech/substrate/blob/00cc5f104176fac6f5a624bced22a2192c7c0470/client/cli/src/config.rs#L652-L660

Will give this more thought, but leaning towards keeping and better documenting the delay with this additional input.

mxinden avatar Aug 30 '22 06:08 mxinden

Unless one pays attention to the additional logged errors, which I doubt is happening in most setups.

That's a good point I guess in the end it depends on the end application, libp2p can't really decide what makes more sense. Maybe we should just explore how to convey the issue to the end application

Menduist avatar Aug 30 '22 11:08 Menduist

It might be a good idea to record this as a metric, and make this delay configurable. An operator may want to alert when this happens since it may mean they should increase their FD limits or something is exhausting their FDs (leak?).

MarcoPolo avatar Sep 12 '22 23:09 MarcoPolo

It might be a good idea to record this as a metric, and make this delay configurable. An operator may want to alert when this happens since it may mean they should increase their FD limits or something is exhausting their FDs (leak?).

We can return an error via Transport::poll. The docs for TransportEvent::ListenerError say "event is for informational purposes only" so we can use this to report it back up and later integrate it with libp2p-metrics.

thomaseizinger avatar Sep 14 '22 03:09 thomaseizinger

I am in favor of @thomaseizinger suggestion above.

That said I will not get to this any time soon. In case someone wants to take this over, let us know.

mxinden avatar Sep 29 '22 14:09 mxinden

The error is already reported as of today. Unless another sub-task is woken, this shouldn't result in a busy loop?

thomaseizinger avatar Sep 30 '22 01:09 thomaseizinger

I had a play around with this and pushed a little test harness with a docker container to test the behaviour for when we run out of file descriptors.

In the current implementation, the error is already returned but it is a busy loop. Can we perhaps change the implementation that, once we return this particular error once, we return Poll::Pending instead?

thomaseizinger avatar Sep 30 '22 02:09 thomaseizinger

I had a play around with this and pushed a little test harness with a docker container to test the behaviour for when we run out of file descriptors.

:rocket: neat! Thank you.

In the current implementation, the error is already returned but it is a busy loop. Can we perhaps change the implementation that, once we return this particular error once, we return Poll::Pending instead?

Given the Delay it is currently not a busy loop. Am I missing something? Returning Poll::Pending without registration of a Waker could result in a stall as nothing will wake us up once a new connection comes in, right?

mxinden avatar Oct 04 '22 09:10 mxinden

In the current implementation, the error is already returned but it is a busy loop. Can we perhaps change the implementation that, once we return this particular error once, we return Poll::Pending instead?

Given the Delay it is currently not a busy loop. Am I missing something? Returning Poll::Pending without registration of a Waker could result in a stall as nothing will wake us up once a new connection comes in, right?

I was referring to the implementation in this PR!

I am not sure about the stall. Would have to experiment. If you are running out of FD you can't accept new connections anyway?

thomaseizinger avatar Oct 04 '22 11:10 thomaseizinger

If you are running out of FD you can't accept new connections anyway?

Even though you might not be able to accept new connections in such case, I think the issue here is that the sys call will still attempt to do so. Thus there is a busy loop.

mxinden avatar Oct 09 '22 19:10 mxinden

This pull request has merge conflicts. Could you please resolve them @mxinden? 🙏

mergify[bot] avatar Mar 22 '23 15:03 mergify[bot]

Closing here since stale and not needed.

mxinden avatar Mar 22 '23 19:03 mxinden