Sub/Pub socket high latency under low load
I am testing Nanomsg NNG PUB/SUB socket performance with pynng .
In the test program below, SUB socket is used as a server to receive messages and PUB socket is used as a client to receive messages. The content of a message is the simply time when it is sent by the PUB client. The SUB server measures the latency by comparing between the time when the message is received time and the time (which is the message sent time) inside the message.
For 1,000 messages sent over ~1 second, I find two issues from the performance test:
-
The latency is very high between PUB and SUB sockets in couple of milliseconds. However, under such low load, I would expect the latency to be in scale of microseconds.
-
The latency performance improves if I make the PUB client sleep in a very short period of time. I understand that sleeping would reduce the load. But I would expect it makes not much difference given the low load. I am surprised that the latency can shoot up to 100ms if there is no sleep.
I am not sure if this issue relates to python GIL. Will my main PUB client thread compete against NNG internal I/O thread for GIL without a sleep?
Any ideas will be welcome and appreciated.
The testing programs are as below:
def run_nng_sub():
import numpy as np
import time, pickle, datetime, threading
from pynng import Sub0, Timeout
address = 'tcp://127.0.0.1:31313'
latencies = []
with Sub0(listen=address, recv_timeout=100) as sub:
sub.subscribe(b'')
time.sleep(0.05)
while True:
try:
b_datetime = sub.recv()
received_time = datetime.datetime.utcnow()
sent_time = pickle.loads(b_datetime)
latency = received_time - sent_time
latencies.append(latency)
print(f'latency: {latency}')
except Timeout:
if len(latencies) > 1000:
break
print(f'latency:\nmax {np.max(latencies)}\naverage {np.mean(latencies)}\nmedian {np.median(latencies)}')
def run_nng_pub():
import time, pickle, datetime
from pynng import Pub0
address = 'tcp://127.0.0.1:31313'
with Pub0(dial=address) as pub:
time.sleep(0.05)
for i in range(1000):
pub.send(pickle.dumps(datetime.datetime.utcnow()))
# time.sleep(1e-5)
And the performance results on my PC are below:
No sleep
latency:
max 0:00:00.165002
average 0:00:00.025910
median 0:00:00.007000
time.sleep(1e-3)
latency:
max 0:00:00.045002
average 0:00:00.002442
median 0:00:00
time.sleep(1e-4)
latency:
max 0:00:00.051001
average 0:00:00.006631
median 0:00:00.001998
time.sleep(1e-5)
latency:
max 0:00:00.046003
average 0:00:00.008652
median 0:00:00.004000
(I just edited your comment to get syntax highlighting, didn't change content at all)
Thanks for opening the issue! It's very nice to have the sample program, too.
I'm not sure exactly what's going on, but I think it's some unfortunate interaction between pub/sub semantics and the GIL. With no sleep, the sending thread never gives up the GIL, and that means the receiving thread never has a chance to wake up and receive it. The receiving socket will then have a delayed message by the time the sending thread yields the GIL, after dropping a bunch of messages. On my computer, without a sleep, about 93% of the messages are dropped.
Whenever there is no sleep at all, the latency on my computer is even worse than on yours: a mean of 0.5 seconds, and max of 1.17 seconds.