cyclonedds-cxx
cyclonedds-cxx copied to clipboard
Performance comparison between dds and dds-cxx throughput (dds-cxx is much worse)
ubuntu: 20.04.1 dds version: 0.9.1 dds-cxx version : 0.9.1 Tested on a single host, enter the command on a terminal:
./bin/ThroughputPublisher
And the other terminal:
./bin/ThroughputSubscriber
Using the throughput tool that comes with dds, the test results are as follows:
Cycles: 0 | PollingDelay: -1 | Partition: Throughput example
=== [Subscriber] Waiting for samples...
=== [Subscriber] 1.001 Payload size: 8192 | Total received: 201714 samples, 1654054800 bytes | Out of order: 0 samples Transfer rate: 201570.69 samples/s, 13222.60 Mbit/s
=== [Subscriber] 1.001 Payload size: 8192 | Total received: 383084 samples, 3141288800 bytes | Out of order: 0 samples Transfer rate: 181228.85 samples/s, 11888.07 Mbit/s
=== [Subscriber] 1.001 Payload size: 8192 | Total received: 555829 samples, 4557797800 bytes | Out of order: 0 samples Transfer rate: 172634.72 samples/s, 11324.03 Mbit/s
=== [Subscriber] 1.001 Payload size: 8192 | Total received: 728681 samples, 5975184200 bytes | Out of order: 0 samples Transfer rate: 172673.59 samples/s, 11326.56 Mbit/s
=== [Subscriber] 1.001 Payload size: 8192 | Total received: 891710 samples, 7312022000 bytes | Out of order: 0 samples Transfer rate: 162908.28 samples/s, 10686.08 Mbit/s
=== [Subscriber] 1.001 Payload size: 8192 | Total received: 1044720 samples, 8566704000 bytes | Out of order: 0 samples Transfer rate: 152891.31 samples/s, 10029.48 Mbit/s
Since there is no throughput example on cxx, refer to dds and the api provided by cxx to test after handwriting:
[Subscriber] Create reader.
[Subscriber] Wait for message.
[Subscriber] Interval[1.00 s] Samples[54829.58 counts] MsgSize[8192.00 bytes] Speed[3426.85 Mbits/s]
[Subscriber] Interval[1.00 s] Samples[54337.14 counts] MsgSize[8192.00 bytes] Speed[3396.07 Mbits/s]
[Subscriber] Interval[1.02 s] Samples[49566.60 counts] MsgSize[8192.00 bytes] Speed[3097.91 Mbits/s]
[Subscriber] Interval[1.01 s] Samples[59811.39 counts] MsgSize[8192.00 bytes] Speed[3738.21 Mbits/s]
[Subscriber] Interval[1.00 s] Samples[51487.54 counts] MsgSize[8192.00 bytes] Speed[3217.97 Mbits/s]
[Subscriber] Interval[1.01 s] Samples[61694.57 counts] MsgSize[8192.00 bytes] Speed[3855.91 Mbits/s]
[Subscriber] Interval[1.01 s] Samples[60861.33 counts] MsgSize[8192.00 bytes] Speed[3803.83 Mbits/s]
[Subscriber] Interval[1.01 s] Samples[64443.52 counts] MsgSize[8192.00 bytes] Speed[4027.72 Mbits/s]
[Subscriber] Interval[1.00 s] Samples[62247.15 counts] MsgSize[8192.00 bytes] Speed[3890.45 Mbits/s]
According to the results shown by the subscriber, it can be seen that the performance of the two is 3-4 times worse.
How to solve this problem? I can provide cxx throughput source code if needed
c++ batch,but i didn't solve it
@YeahhhhLi @wjbbupt The "batching" has a very big effect for really small samples, because only then the overhead of packet processing becomes dominant. A very unscientific experiment using ddsperf
on my laptop (x1000 samples/s):
size batch no-batch
0 362 2740
100 360 2565
1000 328 1295
10000 230 230
so for the 8k case, I think it stands to reason that it wouldn't solve it.
Could you perhaps make a FlameGraph so we can see where the time goes?
@eboasson I want to integrate batch on c++, but I see that there is no binding on c++, how do you do it here?
@YeahhhhLi , there is currently a PR outstanding for C++ equivalents of CycloneDDS-C performance/example programs roundtrip
(latency) and throughput
: PR 325
Could you give those a try and tell us what kind of results you get then? (and review comments would also be very appreciated, of course)
@wjbbupt Sorry, ignore the deleted post please, as you already have an issue open on it here
I tried the code in the link you posted a long time ago. The test result is that it has lower performance than the c version without batching, but after adding batch, the performance does not change much. Theoretically, the performance gap between c and c++ It should not be very large, the actual measured data gap is 2-4 times (sample/s), It is for this reason that I asked why the performance gap between c and c++ is so large. Of course, you can try it to know what I mean; So yours didn't solve the problem, but thanks for your answer anyway
Thanks for reply, lets try it later