seastar [RFC] net: close objects before they are destroyed

The series adds a close method to class across the board in the network stack allowing the user to orderly wait on any background work left behind after shutdown, thus preventing use-after-free, like the one seen in #1190 (and several others, currently noted by FIXME comments peppered around in the code).

The change is introduced as part of a new api_level, 7, where the calls methods are added unconditionally and are voluntary to call at api_level 6, while at api_level 7 or newer they are mandatory to be called and closed condition is verified at the respective destructors.

At this stage the series includes:

server_socket, server_socket_impl
connected_socket, connected_socket_impl
WIP: net: socket, socket_impl: add close method
- here I've hit a brick wall in the rpc streams labyrinth and any change I try either misses stopping or hangs when rpc::connection::stop appear to be deadlocked on itself.
TODO: add close to tcp::connection
- to wait on the future that should be returned from tcb::close() via tcp::connection::close_write to tcp::connection::shutdown

Sep 04 '22 20:09 bhalevy

@elcallio please review the changes in the tls area

Sep 04 '22 20:09 bhalevy

I do not quite understand why you justify such a big (and I am not convince necessary) code and API change by https://github.com/scylladb/seastar/issues/1190 which seams to be a local loopback bug that can probably be fixed by something like that:

diff --git a/tests/unit/loopback_socket.hh b/tests/unit/loopback_socket.hh
index 0e164e7772..1274b278bf 100644
--- a/tests/unit/loopback_socket.hh
+++ b/tests/unit/loopback_socket.hh
@@ -166,9 +166,8 @@ class loopback_connected_socket_impl : public net::connected_socket_impl {
         _rx->shutdown();
     }
     void shutdown_output() override {
-        (void)smp::submit_to(_tx.get_owner_shard(), [this] {
-            // FIXME: who holds to _tx?
-            _tx->shutdown();
+        (void)smp::submit_to(_tx.get_owner_shard(), [this, std::move(_tx)] {
+            tx->shutdown();
         });
     }
     void set_nodelay(bool nodelay) override {

And if that does not work because we pass _tx to other objects we can always wrap foreign_ptr<lw_shared_ptr<loopback_buffer>> in lw_shared_ptr.

Sep 05 '22 07:09 gleb-cloudius

https://github.com/scylladb/seastar/issues/1190 is indeed just a minor symptom but I think it's the tip of an iceberg to a basic design flaw - that we implement a convention of waiting on background fibers using close/stop before destroying network stack objects.

See also https://github.com/scylladb/seastar/issues/1190#issuecomment-1227263091 and several other FIXMEs we have in the code base where we leave background jobs behind unattended.

I don't mind applying https://github.com/scylladb/seastar/pull/1205#issuecomment-1236655697 to fix #1190, but I do think we need to support orderly closing network stack objects before destruction.

Sep 05 '22 08:09 bhalevy

#1190 is indeed just a minor symptom but I think it's the tip of an iceberg to a basic design flaw - that we implement a convention of waiting on background fibers using close/stop before destroying network stack objects.

It is not a symptom, it is just a bug (and according to the comment I knew it was there but unfortunately was to busy with a big picture to fix).

See also #1190 (comment)

Not sure how it is relevant. As far as I understand here there is a missing close() call on an output stream.

and several other FIXMEs we have in the code base where we leave background jobs behind unattended.

I don't mind applying #1205 (comment) to fix #1190, but I do think we need to support orderly closing network stack objects before destruction.

It is not reasonable to add close() to all c++ objects in existence. Why do you think we should limit ourselfs to network stack objects only?

We should consider each object and its usage individually. In case of connected_socket the interface it that it cannot be destroyed before corresponding data_source and data_sink are closed (and that is where all the background work is). Other methods should either not leave background work or handle it by themselves. All other methods are very simple in the first place (setting flags and such). It just so happened that for a test loopback interface one of them needs to execute an operation on another shard, but it is easy to handle it in the test itself.

Sep 05 '22 08:09 gleb-cloudius

#1190 is indeed just a minor symptom but I think it's the tip of an iceberg to a basic design flaw - that we implement a convention of waiting on background fibers using close/stop before destroying network stack objects.

It is not a symptom, it is just a bug (and according to the comment I knew it was there but unfortunately was to busy with a big picture to fix).

See also #1190 (comment)

Not sure how it is relevant. As far as I understand here there is a missing close() call on an output stream.

and several other FIXMEs we have in the code base where we leave background jobs behind unattended. I don't mind applying #1205 (comment) to fix #1190, but I do think we need to support orderly closing network stack objects before destruction.

It is not reasonable to add close() to all c++ objects in existence. Why do you think we should limit ourselfs to network stack objects only?

We should consider each object and its usage individually. In case of connected_socket the interface it that it cannot be destroyed before corresponding data_source and data_sink are closed (and that is where all the background work is). Other methods should either not leave background work or handle it by themselves. All other methods are very simple in the first place (setting flags and such). It just so happened that for a test loopback interface one of them needs to execute an operation on another shard, but it is easy to handle it in the test itself.

We've already added close for the storage stack (i/o streams, data source/sink impl, file, etc.) The main reason there was background I/O, i.e. write behind and read ahead that reference buffers that cannot just be destroyed while I/Os are outstanding, so we needed to abort I/Os and wait for them to resolve.

The network stack doesn't have the same requirements, though there are examples, like tls that does "put behind", but IMO it's better to mandate a design principle rather than some classes having close and some not, and the user needs to remember which one requires what.

To add a close() method to a public api in the future we need to bump the api version, and you don't want to do that for each class you change, so adding this across the board in one api-level change, makes more sense to me, even if some of the classes do nothing in close(). It's not on the hot path anyways so the overhead is negligible.

An alternative would be to track all the background fibers, e.g. under a gate in the network_stack and close/stop that before it's destroyed, but then we'll need to be very careful to extend the life cycle of any resource used by the background fiber (like in the loopback socket case), so it's not destroyed with its parent object while the background fiber is alive.

Personally, I like the close() approach better, certainly since we already have a bunch of those in the network stack.

Sep 06 '22 04:09 bhalevy

objects only?

We should consider each object and its usage individually. In case of connected_socket the interface it that it cannot be destroyed before corresponding data_source and data_sink are closed (and that is where all the background work is). Other methods should either not leave background work or handle it by themselves. All other methods are very simple in the first place (setting flags and such). It just so happened that for a test loopback interface one of them needs to execute an operation on another shard, but it is easy to handle it in the test itself.

We've already added close for the storage stack (i/o streams, data source/sink impl, file, etc.) The main reason there was background I/O, i.e. write behind and read ahead that reference buffers that cannot just be destroyed while I/Os are outstanding, so we needed to abort I/Os and wait for them to resolve.

Objects that generate background work should have a way to way for that work to complete.

The network stack doesn't have the same requirements, though there are examples, like tls that does "put behind", but IMO it's better to mandate a design principle rather than some classes having close and some not, and the user needs to remember which one requires what.

Network communication is done not by using sockets directly, but by using data sink/source and those already have a way to wait for background work to complete.

To add a close() method to a public api in the future we need to bump the api version, and you don't want to do that for each class you change, so adding this across the board in one api-level change, makes more sense to me, even if some of the classes do nothing in close(). It's not on the hot path anyways so the overhead is negligible.

So basically you propose to add close() to all C++ objects. After all each of them can have a method that generates background work in the future.

An alternative would be to track all the background fibers, e.g. under a gate in the network_stack and close/stop that before it's destroyed, but then we'll need to be very careful to extend the life cycle of any resource used by the background fiber (like in the loopback socket case), so it's not destroyed with its parent object while the background fiber is alive.

Personally, I like the close() approach better, certainly since we already have a bunch of those in the network stack.

It is fine to have close() for an object that may have background work. It is not fine to add it just because you think it may in generate one in the future. In the case of the bug in loopback connected socket yes, there was background work generated, but only for a test infrastructure. It does not worth it to change API for that. And if non test implementation of shutdown_output would have needed background work generated I would rather change shutdown_output to return a future<> instead of waiting for it in close() in roundabout way.

Sep 06 '22 07:09 gleb-cloudius

It is fine to have close() for an object that may have background work. It is not fine to add it just because you think it may in generate one in the future. In the case of the bug in loopback connected socket yes, there was background work generated, but only for a test infrastructure. It does not worth it to change API for that.

Without async close we end up having hacks like this:

template <typename InetTraits>
void tcp<InetTraits>::tcb::close() noexcept {
    if (in_state(CLOSED) || _snd.closed) {
        return;
    }
    // TODO: We should return a future to upper layer
    (void)wait_for_all_data_acked().then([this, zis = this->shared_from_this()] () mutable {

And if non test implementation of shutdown_output would have needed background work generated I would rather change shutdown_output to return a future<> instead of waiting for it in close() in roundabout way.

I disagree. I'd like to establish a convention in seastar and scylla where abort / shutdown methods are "synchronous" in nature (maybe return a future if the need to submit_to a different shard or run an async loop), and they just signal the object, while close/stop wait on anything async the object left behind to terminate, before destroying the object.

Currently we have a mish-mash of designs that evolved over time and it's impossible to figure out what exactly does what without diving into the implementation.

Sep 06 '22 08:09 bhalevy

It is fine to have close() for an object that may have background work. It is not fine to add it just because you think it may in generate one in the future. In the case of the bug in loopback connected socket yes, there was background work generated, but only for a test infrastructure. It does not worth it to change API for that.

Without async close we end up having hacks like this:

What is "async close"?

template <typename InetTraits>
void tcp<InetTraits>::tcb::close() noexcept {
    if (in_state(CLOSED) || _snd.closed) {
        return;
    }
    // TODO: We should return a future to upper layer
    (void)wait_for_all_data_acked().then([this, zis = this->shared_from_this()] () mutable {

Here we have close() that leaves background operation. What do you propose? To add really_really_close_i_am_not_joking_this_time()? BTW Linux send RST on a connection that has unsent data when closed.

And if non test implementation of shutdown_output would have needed background work generated I would rather change shutdown_output to return a future<> instead of waiting for it in close() in roundabout way.

I disagree. I'd like to establish a convention in seastar and scylla where abort / shutdown methods are "synchronous" in nature (maybe return a future if the need to submit_to a different shard or run an async loop), and they just signal the object, while close/stop wait on anything async the object left behind to terminate, before destroying the object.

You try to invent random rules and enforce them without considering any specific cases. Nothing good will come from it except a huge number of random patches and bike shading around each of them (for instance raft's abort() aborts all operations and waits for all of them to complete. The name was chosen instead of stop() since the later suggest that the operation waits for all operations to complete as opposite to aborting them. IIRC Avi insisted on it).

shutdown_output in this case does not shutdown a socket. It just make connection one way. The socket itself may leave forever after that. Why should the method follow your random rule? And anyway the discussion is shifted from the point that the method does not leave any background job in any of the code that matters.

Currently we have a mish-mash of designs that evolved over time and it's impossible to figure out what exactly does what without diving into the implementation.

Each interface has its own needs. You cannot require close() method for each c++ object. You bag that question every time I ask it.

Sep 06 '22 08:09 gleb-cloudius

It is fine to have close() for an object that may have background work. It is not fine to add it just because you think it may in generate one in the future. In the case of the bug in loopback connected socket yes, there was background work generated, but only for a test infrastructure. It does not worth it to change API for that.

Without async close we end up having hacks like this:

What is "async close"?

future<> close()

template <typename InetTraits>
void tcp<InetTraits>::tcb::close() noexcept {
    if (in_state(CLOSED) || _snd.closed) {
        return;
    }
    // TODO: We should return a future to upper layer
    (void)wait_for_all_data_acked().then([this, zis = this->shared_from_this()] () mutable {
Here we have close() that leaves background operation. What do you propose? To add really_really_close_i_am_not_joking_this_time()? BTW Linux send RST on a connection that has unsent data when closed.

Absolutely not. Change the signature to return the future<> so that the caller can wait on it.

And if non test implementation of shutdown_output would have needed background work generated I would rather change shutdown_output to return a future<> instead of waiting for it in close() in roundabout way.

I disagree. I'd like to establish a convention in seastar and scylla where abort / shutdown methods are "synchronous" in nature (maybe return a future if the need to submit_to a different shard or run an async loop), and they just signal the object, while close/stop wait on anything async the object left behind to terminate, before destroying the object.

You try to invent random rules and enforce them without considering any specific cases.

It's not random. It's one model of the many we currently have.

Nothing good will come from it except a huge number of random patches and bike shading around each of them (for instance raft's abort() aborts all operations and waits for all of them to complete. The name was chosen instead of stop() since the later suggest that the operation waits for all operations to complete as opposite to aborting them. IIRC Avi insisted on it).

If we'd had documented guidelines in place we could design the implementation around them. We had a similar discussion regarding the queue reader in scylla at the time IIRC.

I think it's ok for abort or shutdown to return a future, but I see it as way to wait on the aborting process itself, or with shutdown, this could wait on e.g. sending a "goodbye" handshake packets, but that's it. Waiting on all resources that an object holds to be eligible for destruction post abort/shutdown should be the job of close/stop.

shutdown_output in this case does not shutdown a socket. It just make connection one way. The socket itself may leave forever after that. Why should the method follow your random rule? And anyway the discussion is shifted from the point that the method does not leave any background job in any of the code that matters.

Currently we have a mish-mash of designs that evolved over time and it's impossible to figure out what exactly does what without diving into the implementation.

Each interface has its own needs. You cannot require close() method for each c++ object. You bag that question every time I ask it.

It would be nice to have an sync destructor (returning a future<>) for all c++ object, but we don't have such a thing. In addition to (sharded) services and synchronization primitives (semaphores, locks, condition variables), we should have a close method where there's I/O and/or communication, like pipes, queues, storage I/O, and yes, networking.

As for the socket class, maybe I misunderstood its function. My goal is to either return a future from tcb::close, or rename to something like close_write and keep track of the background fiber it leaves behind, to be joined later using a newly added future<> close() function.

The thing is that it's called today by tcp<InetTraits>::connection::close_write from synchronous entry points like tcp<InetTraits>::connection::shutdown_connect or tcp<InetTraits>::connection::~connection, so we can't wait on anything there.

The tcp connection ends up as native_socket_impl::_conn, so to be able to properly close() it before it's destroyed, we need a respective future<> close() function, so my thinking was to generalize this, and provide a virtual future<> close() function in socket_impl, to be respectively called by future<> socket::close().

Sep 06 '22 10:09 bhalevy

What classes gain a close() here? data_sink/data_source already have them, which classes need them too and cannot make use of data_source/data_sink?

Sep 06 '22 10:09 avikivity

It is fine to have close() for an object that may have background work. It is not fine to add it just because you think it may in generate one in the future. In the case of the bug in loopback connected socket yes, there was background work generated, but only for a test infrastructure. It does not worth it to change API for that.

Without async close we end up having hacks like this:

What is "async close"?

future<> close()
template <typename InetTraits>
void tcp<InetTraits>::tcb::close() noexcept {
    if (in_state(CLOSED) || _snd.closed) {
        return;
    }
    // TODO: We should return a future to upper layer
    (void)wait_for_all_data_acked().then([this, zis = this->shared_from_this()] () mutable {
Here we have close() that leaves background operation. What do you propose? To add really_really_close_i_am_not_joking_this_time()? BTW Linux send RST on a connection that has unsent data when closed.
Absolutely not. Change the signature to return the future<> so that the caller can wait on it.

Ah. I did not notice it returns void and not future<> so misunderstood what you meant. But then I do not see why you give this native stack internals as an example here. May be this signature has to be changed to return a future<>, but how is it related to the claim that all network objects should have close()?

BTW since here we are talking about the full networking stack implementation the things are a little bit more complicated. If you use OS's networking stack when you close a file description it does not mean that all work associated with the connection suddenly disappears. No. The networking stack may keep the connection in certain states and continue sending/receiving packets there long after the fd is gone. So it would be even wrong to make sure that socket::close wait for all this work to complete in native stack as well.

And if non test implementation of shutdown_output would have needed background work generated I would rather change shutdown_output to return a future<> instead of waiting for it in close() in roundabout way.

I disagree. I'd like to establish a convention in seastar and scylla where abort / shutdown methods are "synchronous" in nature (maybe return a future if the need to submit_to a different shard or run an async loop), and they just signal the object, while close/stop wait on anything async the object left behind to terminate, before destroying the object.

You try to invent random rules and enforce them without considering any specific cases.

It's not random. It's one model of the many we currently have.a

Exactly. It is one model. We have several. Why do you apply one random model to everything? I do not use "random" as an insult. It may be a fine model where applicable.

Nothing good will come from it except a huge number of random patches and bike shading around each of them (for instance raft's abort() aborts all operations and waits for all of them to complete. The name was chosen instead of stop() since the later suggest that the operation waits for all operations to complete as opposite to aborting them. IIRC Avi insisted on it).

If we'd had documented guidelines in place we could design the implementation around them. We had a similar discussion regarding the queue reader in scylla at the time IIRC.

Documenting existing interfaces and assumptions is of course important. For connected_socket it is important to document that it cannot be destroyed before corresponding sink/source are closed and no other operation suppose to leave unattended work unless it can proceed without the socket object.

I think it's ok for abort or shutdown to return a future, but I see it as way to wait on the aborting process itself, or with shutdown, this could wait on e.g. sending a "goodbye" handshake packets, but that's it. Waiting on all resources that an object holds to be eligible for destruction post abort/shutdown should be the job of close/stop.

Why? Why not abort() should abort all ongoing operation without waiting for them to complete and stop() should prevent new operation to go in and wait for ongoing one to complete successfully? (not that I propose to require it, just an alternative proposal to bike shed about).

Why not leave it to the class implementation, but require documentation instead?

shutdown_output in this case does not shutdown a socket. It just make connection one way. The socket itself may leave forever after that. Why should the method follow your random rule? And anyway the discussion is shifted from the point that the method does not leave any background job in any of the code that matters.

Currently we have a mish-mash of designs that evolved over time and it's impossible to figure out what exactly does what without diving into the implementation.

Each interface has its own needs. You cannot require close() method for each c++ object. You bag that question every time I ask it.

It would be nice to have an sync destructor (returning a future<>) for all c++ object, but we don't have such a thing. But you basically say we should add it in form of the close().

In addition to (sharded) services and synchronization primitives (semaphores, locks, condition variables), we should have a close method where there's I/O and/or communication, like pipes, queues, storage I/O, and yes, networking.

We have close() for networking on sink/source level. Obviously one is needed.

As for the socket class, maybe I misunderstood its function. My goal is to either return a future from tcb::close, or rename to something like close_write and keep track of the background fiber it leaves behind, to be joined later using a newly added future<> close() function.

The thing is that it's called today by tcp<InetTraits>::connection::close_write from synchronous entry points like tcp<InetTraits>::connection::shutdown_connect or tcp<InetTraits>::connection::~connection, so we can't wait on anything there.

The tcp connection ends up as native_socket_impl::_conn, so to be able to properly close() it before it's destroyed, we need a respective future<> close() function, so my thinking was to generalize this, and provide a virtual future<> close() function in socket_impl, to be respectively called by future<> socket::close().

As I said above a tcp connection may outlive its socket by several minutes. I wouldn't try to bind their live times blindly.

Sep 06 '22 11:09 gleb-cloudius

seastar seastar copied to clipboard

[RFC] net: close objects before they are destroyed

seastar
seastar copied to clipboard