paho.mqtt.c icon indicating copy to clipboard operation
paho.mqtt.c copied to clipboard

Turn off debug facilities to improve performance - but allow them to be switched on dynamically

Open fpagliughi opened this issue 5 years ago • 15 comments

A user of the Paho Rust client (@tobdub) put up an issue about poor performance of the Rust client compared to a similar Python test app:

For some reason the Rust version is using 50% more cpu than the Python version (running on AWS T3 instance).

and...

According to the flamegraph, a lot of time is spent in WebSocket_getch. It seems to be reading single chars from the TCP stream?

The flamegraph is in the Rust issue: https://github.com/eclipse/paho.mqtt.rust/issues/63

fpagliughi avatar Jan 08 '20 16:01 fpagliughi

Yep the websocket support needs a scrutiny...

icraggs avatar Jan 09 '20 12:01 icraggs

The code is not actually using websockets. Looks like WebSocket_getch just calls SSLSocket_getch.

tobdub-snce avatar Jan 09 '20 13:01 tobdub-snce

The potential performance issues that I found are overhead from StackTrace and a lack of buffered I/O. The use case is receiving thousands of messages per second in a server.

tobdub-snce avatar Jan 11 '20 11:01 tobdub-snce

I don't think it's reading single chars from the socket. It is waiting in getch() for the first byte in the MQTT packet, then reading in the remaining length field bytewise. Once the packet length is figured out, the rest of the packet can read in chunks.

So the stack tracing can be turned off at build time, if you are building the C library by setting the compile definition NOSTACKTRACE. Setting NO_HEAP_TRACKING disables the heap tracking.

What do you mean by lack of buffered I/O?

icraggs avatar Jan 11 '20 14:01 icraggs

True. Even the few single chars read from the packet header seem to cause some noticeable overhead, although it is improved with stacktrace disabled. With buffered I/O I meant that I thought it would be possible to recv data from a TCP socket into a buffer of a few kilobytes and process it from there, to reduce the number of sys calls?

tobdub-snce avatar Jan 13 '20 10:01 tobdub-snce

Ok. When I wrote that code, a decade ago, I assumed that as TCP has its own buffers too, that this wouldn't be an issue. TLS (OpenSSL) does complicate matters too, but as Python (and I assume PyPy) also uses OpenSSL then they both face the same obstacle.

icraggs avatar Jan 13 '20 18:01 icraggs

Yes, Paho Python works in the same way. The gain of a client side buffer I guess would be to avoid the syscall overhead of many small packets. With stacktrace disabled the performance seems similar to Paho Python with PyPy. One small thing that may be possible is to read 2 bytes at the start of a packet since the header is always at least two bytes long.

tobdub-snce avatar Jan 14 '20 10:01 tobdub-snce

Logs also seem to cause some significant overhead (21% of CPU time in my test case with stacktrace enabled). It seems trace_level can't be set lower than TRACE_MINIMUM. Is it possible to disable the trace logging completely?

tobdub-snce avatar Jan 14 '20 10:01 tobdub-snce

I assume you've used both NOSTACKTRACE and NO_HEAP_TRACKING?

We could add an option to disable logging altogether. You could try setting trace_settings.trace_level to LOG_SEVERE, and see what difference that makes. Then we could also add a compile option.

On the i/o buffering question, we could prove the impact by implementing a simple test version and see what difference that makes. That would show how much value there would be in making those changes properly. You are using TLS, aren't you?

icraggs avatar Jan 14 '20 13:01 icraggs

I haven't retested using both NOSTACKTRACE and NO_HEAP_TRACKING. Can I set those flags using CMake? In Log_setTraceLevel it seems like I can't set trace level lower than TRACE_MINIMUM which still includes some logs for every message. The main overhead from logging seems to come from the vsnprintf and the mutex.

tobdub-snce avatar Jan 14 '20 14:01 tobdub-snce

No. If you are using CMake I was hoping you could add the definitions to your CMakeLists.txt temporarily so you could see what difference it makes. It will probably take me a while to figure out the best way to implement the changes in CMake.

In the main CMakeLists.txt, addding:

ADD_DEFINITIONS(-DNOSTACKTRACE -DNO_HEAP_TRACKING)

should do the trick.

icraggs avatar Jan 14 '20 15:01 icraggs

I retested now with NOSTACKTRACE and NO_HEAP_TRACKING and the performance is finally better than for Paho Python! The remaining logging overhead is now at 16% CPU time (No TLS, default log settings), so it still seems worth addressing. What's the reason for not allowing trace_settings.trace_level to go below TRACE_MINIMUM?. Heap.h has a HIGH_PERFORMANCE option. Maybe we could add a similar option for CMake that defines both NOSTACKTRACE and NO_HEAP_TRACKING.

tobdub-snce avatar Jan 14 '20 15:01 tobdub-snce

maybe we should use -DNOSTACKTRACE -DNO_HEAP_TRACKING for Release?

jumoog avatar Jan 17 '20 10:01 jumoog

I would like to retain the ability for people to take a trace without having to use a completely separate build of the library. So we'd need to add the ability to turn the debug facilities (including logging) on and off dynamically. I think that this can be done while retaining the performance advantage.

I'll change the issue title to reflect this.

icraggs avatar Jan 17 '20 16:01 icraggs

Just to note that the PAHO_HIGH_PERFORMANCE CMake option now works in the develop branch.

icraggs avatar Mar 18 '20 17:03 icraggs