paho.mqtt.c
paho.mqtt.c copied to clipboard
Turn off debug facilities to improve performance - but allow them to be switched on dynamically
A user of the Paho Rust client (@tobdub) put up an issue about poor performance of the Rust client compared to a similar Python test app:
For some reason the Rust version is using 50% more cpu than the Python version (running on AWS T3 instance).
and...
According to the flamegraph, a lot of time is spent in WebSocket_getch. It seems to be reading single chars from the TCP stream?
The flamegraph is in the Rust issue: https://github.com/eclipse/paho.mqtt.rust/issues/63
Yep the websocket support needs a scrutiny...
The code is not actually using websockets. Looks like WebSocket_getch
just calls SSLSocket_getch
.
The potential performance issues that I found are overhead from StackTrace and a lack of buffered I/O. The use case is receiving thousands of messages per second in a server.
I don't think it's reading single chars from the socket. It is waiting in getch() for the first byte in the MQTT packet, then reading in the remaining length field bytewise. Once the packet length is figured out, the rest of the packet can read in chunks.
So the stack tracing can be turned off at build time, if you are building the C library by setting the compile definition NOSTACKTRACE. Setting NO_HEAP_TRACKING disables the heap tracking.
What do you mean by lack of buffered I/O?
True. Even the few single chars read from the packet header seem to cause some noticeable overhead, although it is improved with stacktrace disabled. With buffered I/O I meant that I thought it would be possible to recv data from a TCP socket into a buffer of a few kilobytes and process it from there, to reduce the number of sys calls?
Ok. When I wrote that code, a decade ago, I assumed that as TCP has its own buffers too, that this wouldn't be an issue. TLS (OpenSSL) does complicate matters too, but as Python (and I assume PyPy) also uses OpenSSL then they both face the same obstacle.
Yes, Paho Python works in the same way. The gain of a client side buffer I guess would be to avoid the syscall overhead of many small packets. With stacktrace disabled the performance seems similar to Paho Python with PyPy. One small thing that may be possible is to read 2 bytes at the start of a packet since the header is always at least two bytes long.
Logs also seem to cause some significant overhead (21% of CPU time in my test case with stacktrace enabled).
It seems trace_level
can't be set lower than TRACE_MINIMUM
. Is it possible to disable the trace logging completely?
I assume you've used both NOSTACKTRACE and NO_HEAP_TRACKING?
We could add an option to disable logging altogether. You could try setting trace_settings.trace_level to LOG_SEVERE, and see what difference that makes. Then we could also add a compile option.
On the i/o buffering question, we could prove the impact by implementing a simple test version and see what difference that makes. That would show how much value there would be in making those changes properly. You are using TLS, aren't you?
I haven't retested using both NOSTACKTRACE and NO_HEAP_TRACKING. Can I set those flags using CMake?
In Log_setTraceLevel
it seems like I can't set trace level lower than TRACE_MINIMUM
which still includes some logs for every message. The main overhead from logging seems to come from the vsnprintf
and the mutex.
No. If you are using CMake I was hoping you could add the definitions to your CMakeLists.txt temporarily so you could see what difference it makes. It will probably take me a while to figure out the best way to implement the changes in CMake.
In the main CMakeLists.txt, addding:
ADD_DEFINITIONS(-DNOSTACKTRACE -DNO_HEAP_TRACKING)
should do the trick.
I retested now with NOSTACKTRACE
and NO_HEAP_TRACKING
and the performance is finally better than for Paho Python!
The remaining logging overhead is now at 16% CPU time (No TLS, default log settings), so it still seems worth addressing. What's the reason for not allowing trace_settings.trace_level
to go below TRACE_MINIMUM
?.
Heap.h
has a HIGH_PERFORMANCE
option. Maybe we could add a similar option for CMake that defines both NOSTACKTRACE
and NO_HEAP_TRACKING
.
maybe we should use -DNOSTACKTRACE -DNO_HEAP_TRACKING for Release?
I would like to retain the ability for people to take a trace without having to use a completely separate build of the library. So we'd need to add the ability to turn the debug facilities (including logging) on and off dynamically. I think that this can be done while retaining the performance advantage.
I'll change the issue title to reflect this.
Just to note that the PAHO_HIGH_PERFORMANCE CMake option now works in the develop branch.