Stopping http takes 1 second
Hello,
This is more a question than a real issue.
We are using uWebsockets to put up a minimal http server for REST api requests in our C++ code. The code goes more or less like so:
uWS::App app;
AttachAllRouters(app);
app.listen(port,
[this](auto* socket)
{
if(socket) {
listenSocket = socket;
} else {
throw std::logic_error("Failed binding to port " + port);
}
});
app.run();
LOG(Http Backend event loop stopped);
All of this happens on a separate thread, but fully within that thread. listenSocket is a member of my class: us_listen_socket_t* listenSocket{nullptr};
When the time comes to close the program I call from the main thread:
LOG("Stopping server");
if(running.load())
{
us_listen_socket_close(NO_SSL, listenSocket);
}
LOG("socket closed");
if(serverThread.joinable())
{
serverThread.join();
}
LOG("Exiting HTTPServer");
Running this gives me:
00:00:00:00.170 [StopServer]: Stopping server
00:00:00:00.170 [StopServer]: socket closed
00:00:00:01.018 [RunServer]: HTTP Backend event loop stopped
00:00:00:01.018 [StopServer]: Exiting HTTPServer
So it takes almost a full second for the socket to close (or at least for app.run() to exit after us_listen_socket_close returns). I can confirm that on this time, no http calls are active, nobody is communicating to the http.
This 1 second might not seem like a big issue and it is not in production. But we have a series of acceptance tests. For each of these tests, the http server gets build and destroyed again. So the 1 seconds goes times 100. 80% of our acceptance tests time is just waiting for the httpserver to go down again.
Is this timeout a setting we can influence? Can we reduce it (even if only during the testing)? Or can you safely reduce it in a next release?
Thanks
For each of these tests, the http server gets build and destroyed again.
Yes this is how tests in our fuzzing folder works. We do millions of setups/teardowns per second.
Your issue is that us_listen_socket_close(NO_SSL, listenSocket); is not thread safe. Your issue is also that you are temporally syncing, but not data syncing. For instance, setting listenSocket = socket; ins thread A but expecting it to be readable in thread B is breaking the memory model. You cannot just temporally sync, you need to have proper memory barriers.
So I would start using ThreadSanitizer because you would catch these issues automatically.
But ideally I wouldn't use threads at all. But I'm not your mother ;)
Ah, I forgot to say, you use uWS::Loop::defer to wrap something that is not thread safe. It's basically like Boost io_context::post
But ideally I wouldn't use threads at all.
Curious about this statement. So, for a web server built with uWebSockets, would the preferred model be to fork subprocesses as multiple workers, equal to the amount available in std::thread::hardware_concurrency() and avoid threads completely?
userver, for example, uses multiple forked subprocesses and each with threads: https://userver.tech/db/d69/md_en_2userver_2tutorial_2production__service.html#autotoc_md1076
@uNetworkingAB Not using threads is not an option. The entire HTTP server being put up using using uwebsockets is just a minor feature in our application. We are managing dozens of threads in a thread safe way.
Based on the user manual we are already putting as much as we can on a single thread. But please explain: if the app.run() blocks that thread from my point of view, how should that same thread still be able to call us_listen_socket_close?
From the user manual I understand that capturing the pointer to the listener and using us_listen_socket_close is indeed the correct way to go?
(because there is also a app.close() function, but that is never used in any of the examples). As far as I can find there is also not a single example that shows how to stop the uWs::app. Most of them (if not all) run forever.
I am also looking into your fuzzy examples. e.g. https://github.com/uNetworking/uWebSockets/blob/master/fuzzing/AsyncEpollHelloWorld.cpp is close to what we are doing.
The us_listen_socket_close is called in much the same way as we do, but in this strange teardown() function. This teardown is not called in the test code itself, so I can only assume it is part of the libEpollFuzzer framework. The manual there does not mention the 'teardown' function. I looked into the underlying libfuzz, bit also does not mention the teardown function. In other words, I cannot determine when it gets called. Or more importantly on which thread.
So, if I missed in the manual, the examples or any of the tests an example on how to correctly close down the uWs::app from within the application itself, please point me to it.
Also, on this:
You cannot just temporally sync, you need to have proper memory barriers.
Since 99% of what happens to the socket is inside of the uWebsockets library (well, actually framework), I have no option to set proper memory barriers. you have to tell me which barriers you are using so that I can safely use the same. So either I need to be able to lock a mutex or set an atomic flag or something like that which you use as well. Otherwise I can lock as much as I want but it makes no sense.
PS: we are also using a uWs::websocket to put up a stream to send outgoing data to registered clients. Of course this is also pushing data to send from a separate thread since app.run() is blocking the httpServer thread already. How else can we still do something ourselves?
I am so sorry if I come across as dense here. I just don't see how you can embedded an httpServer framework into another application without some form of threading and still maintain control to stop or not be slave only to incoming calls.
One more question: you mention uWS::Loop::defer can you expand on that or point to the example/manual chapter that shows how you expect me to use this?
Hello, I know I am going off topic here, but it seems my original question immediately triggered comments on threading. As I mentioned in my previous message, we are also using a websocket to push data to clients and also there the threading is a question mark. But I think we figured something out today, so I am just checking here with the specialists.
As said, we want to push data on a websocket, but we are not allowed to push from another thread and our main thread is blocked by app.run(); Looking at the broadcast example we see the usage of us_timer_set. So we should register some sort of timer function using this us_timer_set. This function will then periodically get called inside the app.run() thread (and thus on the acceptable thread). From this function, which we control, we can then use mutexes, locks and atomics to properly access shared memory between threads to retrieve updated data for sending. But, all the while this happens, the entire http server is blocked.
Did we get this correctly?
If so, that clears one threading mystery. Then only the 'close sockets' question from previously remains.
I think that I figured something out here.
I changed the implementation to use the uWS::Loop::defer method:
// on server thread
void RunServerThread()
{
uWS::App app;
AttachAllRouters(app);
app.listen(port, [this](us_listen_socket_t* socket) {listenSocket = socket;});
appLoop = uWS::Loop::get();
app.run();
}
// on main thread
void StopServer()
{
if(appLoop != nullptr && listenSocket != nullptr)
{
appLoop->defer([this](){us_listen_socket_close(NO_SSL, this->listenSocket);});
}
if(serverThread.joinable())
serverThread.join();
}
Explained in words: On the ServerThread I create the app, attach all routers, open the port, get the loop pointer and start the uWS::app by calling the blocking run function. Here I also store the listenSocket pointer.
On my main thread, when it is time to close, I call the defer with a one time action to close the socket. As far as I understand, this will 'schedule' my anonymous function task on the ServerThread loop, so it gets called synchonously. And thus the us_listen_socket_close happens inside the thread of uWS and everybody is happy.
With this the ~1 second delay goes away.
Does this make sense?
For some unknown reason, we could not get the same loop::defer method to work for us_timer_close. That gave strange memory issues (which could be our own as well, not certain). A work around that we also tried and I wanted to share is using the timer itself.
We have an atomic (so that should be thread safe to read and change) that indicates if we want to stop (false by default). The, in the timer definition we say:
struct us_loop_t* loop = (struct us_loop_t*)uWS::Loop::get();
struct TimerContext
{
MyObject::Impl* impl; // this could also be pointer/reference to the atomic 'stopRequested'.
};
struct us_timer_t* sendTimer = us_create_timer(loop, 0, sizeof(TimerContext));
auto* context = reinterpret_cast<TimerContext*>(us_timer_ext(sendTimer));
context->impl = this;
us_timer_set(sendTimer,
[](struct us_timer_t* t)
{
auto* ctx = reinterpret_cast<TimerContext*>(us_timer_ext(t));
auto* impl = ctx->impl;
if(impl->stopRequested.load())
{
us_timer_close(t);
return;
}
// real work of timer
},TIMER_INITIAL_DELAY_MS, TIMER_INTERVAL_MS);
Then when we want things to stop, we set stopRequested to true and after at maximum TIMER_INTERVAL_MS time the timer stops.
The advantage being that us_timer_close is called on the same thread as the uWs loop as requested.
But this is a work around, it has two issues: the timer is doing extra work frequently + there is a potential lag of TIMER_INTERVAL_MS. The defer method from the previous post does not have these issues, but is more tedious on memory management as it requires you to catch both the loop and every timer pointer you want to stop.
That gave strange memory issues
So use ThreadSanitizer and AddressSanitizer and fix your code. This code is not technically correct like I said. And because it is not technically correct, I know that you aren't using ThreadSanitizer, and from that I know that you probably have a heap of issues in your code.
So start using those sanitizers and you will get yourself correct and working code instead of making up polling workarounds like that timer thingy.
send from a separate thread since app.run() is blocking the httpServer thread already. How else can we still do something ourselves?
You are hyper focused on this idea that a thread = blocked and therefore unusable.
You need to stop writing workarounds and go back to the basics of how an event loop works.
1 single thread can integrate any amount of IO projects seamlessly. This is how Node.js works.
Ideally you pick an event loop such as libuv, then you bring in IO projects that use libuv. They will integrate on the 1 single thread.
Most C and some C++ libraries support libuv, and uWS can be configured to use libuv and therefore seamlessly integrate with most libraries without any threading. That's the ideal set up. Everything async, everything single threaded.