CPU usage is quite time-consuming.(NativeUnixSocket.receive)
Describe the bug I'm sorry, I'm not quite sure what caused the CPU usage to be particularly high. Can you help me check if there is a problem with the code writing? To Reproduce Steps to reproduce the behavior:
- com.sdtp.uds.JmsServer
- com.sdtp.uds.JmsClient Expected behavior I am running JmsServer and JmsClient with 12 threads on a 48 core CPU host, which can generate approximately 4TB of data per minute, but the CPU usage is nearly 17%. Output/Screenshots If applicable, add console output/screenshots to help explain your problem. Notes Add any other context about the problem here. Please link/attach any source code that is useful to diagnose the issue. com.kohlschutter.junixsocket:junixsocket-native-common:2.10.1
I'm not sure I understand your concern, please clarify:
Your CPU is transferring 4 TB per Minute (68 GB per second), which you say it's 17% of your CPU, so 83% of your CPU is idle.
Doesn't sound too bad to me, especially when looking at what the code does.
Have you tried benchmarking it against some other implementation?
I'm not sure I understand your concern, please clarify:
Your CPU is transferring 4 TB per Minute (68 GB per second), which you say it's 17% of your CPU, so 83% of your CPU is idle.
Doesn't sound too bad to me, especially when looking at what the code does.
Have you tried benchmarking it against some other implementation?
Yes, I compared it with Java sockets, and for the same amount of data transmission, using TCP only takes up about 5% of the CPU. Perhaps it's because Unix domain sockets have low performance due to file system swapping, right?
Please try running the selftest with the options below to get an understand how the different sockets / implementations perform on your system, and report back with the output from the following command. (use the selftest jar from the "Releases" page).
java -Dselftest.only=ThroughputTest -Dorg.newsclub.net.unix.throughput-test.seconds=10 -Dorg.newsclub.net.unix.throughput-test.payload-size=8192 -Dorg.newsclub.net
.unix.throughput-test.ip.enabled=true -Dselftest.enable-module.junixsocket-common.JavaInet=true -Dselftest.enable-module.junixsocket-common.JEP380=true -jar junixsocket-selftest
-2.10.1-jar-with-dependencies.jar
This runs a single-threaded throughput test (1 client/server connection each), comparing different socket types and implementations (AF_* is junixsocket; java.net is TCP/IP (sockets) and UDP/IP (datagrams); JEP380 is Java16+ Unix socket). See junixsocket-common/src/test/java/.../ThroughputTest.java for the source.
On an old i7-9700 Linux machine running Alpine Linux with a 6.12 kernel (without particular tuning options), I'm getting the following numbers (single run -> ballpark number, not a proper benchmark result):
(AF_UNIX DatagramChannel direct=false;blocking=true): 5127.646 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_UNIX DatagramChannel direct=true;blocking=true): 5777.1284 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_UNIX DatagramChannel direct=false;blocking=false): 2798.9905 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_UNIX DatagramChannel direct=true;blocking=false): 3160.1934 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_UNIX DatagramPacket): 2198.7942 MB/s for datagram payload size 8192; 0.0% packet loss
(JEP380 SocketChannel direct=false): 639.9689 MB/s for payload size 8192
(JEP380 SocketChannel direct=true): 695.97675 MB/s for payload size 8192
(AF_UNIX byte[]): 871.33136 MB/s for payload size 8192
(AF_UNIX SocketChannel direct=false): 691.33276 MB/s for payload size 8192
(AF_UNIX SocketChannel direct=true): 659.39044 MB/s for payload size 8192
(java.net DatagramChannel direct=false;blocking=true): 3342.7056 MB/s for datagram payload size 8192; 0.0% packet loss
(java.net DatagramChannel direct=true;blocking=true): 3834.6228 MB/s for datagram payload size 8192; 0.0% packet loss
(java.net DatagramPacket): 3298.7087 MB/s for datagram payload size 8192; 0.0% packet loss
(java.net byte[]): 866.64136 MB/s for payload size 8192
(java.net SocketChannel direct=false): 618.4202 MB/s for payload size 8192
(java.net SocketChannel direct=true): 645.7466 MB/s for payload size 8192
(AF_VSOCK byte[]): 616.90594 MB/s for payload size 8192
(AF_VSOCK SocketChannel direct=false): 489.9955 MB/s for payload size 8192
(AF_VSOCK SocketChannel direct=true): 518.8026 MB/s for payload size 8192
(AF_TIPC DatagramChannel direct=false;blocking=true): 4956.1694 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_TIPC DatagramChannel direct=true;blocking=true): 5657.542 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_TIPC DatagramChannel direct=false;blocking=false): 2846.5063 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_TIPC DatagramChannel direct=true;blocking=false): 2974.5562 MB/s for datagram payload size 8192; 0.0% packet loss
(AF_TIPC DatagramPacket): 2242.4963 MB/s for datagram payload size 8192; 60.2% packet loss
(AF_TIPC byte[]): 947.2008 MB/s for payload size 8192
(AF_TIPC SocketChannel direct=false): 694.9053 MB/s for payload size 8192
(AF_TIPC SocketChannel direct=true): 742.5425 MB/s for payload size 8192
As you can see, UNIX sockets are generally faster (especially for datagrams, 1,5x) or just slightly / as fast as the IP-based sockets (1,11x/1,02x for non-direct/direct buffers). junixsocket's Unix socket is just as fast as JEP380 for directly-buffered sockets and slightly faster (1,08x) when using non-direct buffers. AF_VSOCK sockets are slightly slower than UNIX sockets; AF_TIPC is also a well-performing socket implementation comparable to AF_UNIX.
Note that both in my tests and yours, we're running single-connection tests, i.e., only 2 out of N possible threads are utilized.
With your code, I'm seeing significantly better numbers when running multiple JmsClients at once.
In my ThroughputTest, we're testing a fixed payload of (in our configuration) 8192 bytes. Performance will suffer when using smaller payload — in your case, the payload is ca. 350 bytes.
Immediate suggestions:
- Try
AFUNIXDatagramChannelfor package-based communication (if it's working on your platform, it's guaranteed to not have any package loss). This is going to help especially where low-latency is most important (but then you'd have to measure response times as well) - Try multiple threads. This may help for throughput.
Also, you mentioned Inet sockets would consume less CPU time (17%->5% of available CPU). You didn't mention the throughput. If you want to minimize CPU usage, you can of course try throttling the throughput. I'd also be interested to know if any network-card offload features as well as some non-standard net.ipv4-related sysctls are playing into this as well (try connecting to loopback 127.0.0.1 and local IP address separately, maybe also IPv6; also try your code on a non-modified machine, ideally a freshly setup one or a VM).
Closing stale issue