[aiohttp] - use lcg as random generator
We currently make a large number of randint calls, particularly within the update method — for example, each call to update may trigger random * 2 * number of queries.
The existing Python implementation uses the Mersenne Twister, which is a high-quality RNG with 53-bit precision and a very long period (2**19937−1). However, for our simplified use case, a classic LCG (Linear Congruential Generator) is more than sufficient and significantly faster, especially given the number of calls we make.
Benchmark code
from random import sample, randint, randrange, choices
import timeit
# Define number of queries
num_queries = 20
# Benchmark the two approaches
def method1():
return sample(range(1, 10001), num_queries)
def method2():
return random_unique_ids(num_queries)
from app.random_utils import random_unique_ids as random_unique_ids_cython
def method3():
return random_unique_ids_cython(num_queries)
# Run benchmark
iterations = 100000
time1 = timeit.timeit(method1, number=iterations)
time2 = timeit.timeit(method2, number=iterations)
time3 = timeit.timeit(method3, number=iterations)
print(f"sample: {time1:.6f} seconds")
print(f"lcg pure python: {time2:.6f} seconds")
print(f"lcg cython: {time3:.6f} seconds")
print(f"Difference: {abs(time1-time3):.6f} seconds")
print(f"lcg cython is faster by {(max(time1,time3)/min(time1,time3)-1)*100:.2f}%")
Results
sample: 0.749489 seconds
lcg pure python: 0.379528 seconds
lcg cython: 0.039777 seconds
Difference: 0.709712 seconds
lcg cython is faster by 1784.24%
Updates
Before
Queries: 10 for update
wrk -H 'Host: tfb-server' -H 'Accept: application/json,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 30 -c 32 --timeout 8 -t 6 "http://tfb-server:8080/updates/10"
---------------------------------------------------------
Running 30s test @ http://tfb-server:8080/updates/10
6 threads and 32 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.63ms 1.88ms 76.45ms 96.61%
Req/Sec 3.41k 844.18 9.66k 72.68%
Latency Distribution
50% 1.40ms
75% 1.90ms
90% 2.48ms
99% 6.93ms
611203 requests in 30.10s, 270.92MB read
Requests/sec: 20305.71
Transfer/sec: 9.00MB
After
Running 30s test @ http://tfb-server:8080/updates/10
6 threads and 32 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.38ms 848.87us 31.48ms 84.04%
Req/Sec 3.74k 664.81 11.00k 69.42%
Latency Distribution
50% 1.17ms
75% 1.78ms
90% 2.33ms
99% 3.77ms
669708 requests in 30.10s, 296.85MB read
Requests/sec: 22249.46
Transfer/sec: 9.86MB
STARTTIME 1749525574
@Dreamsorcerer please take a look 🙏
Found old but interesting article with the main conclusion is that the built-in random.randint() function should be avoided in performance-critical code.
Also tested third-party libraries fastrand and numpy
sample: 0.538499 seconds
fastrand pure python: 0.167712 seconds
LCG cython: 0.031578 seconds
numpy: 0.070478 seconds
Numpy does a good job using array shuffle. It's still twice slower than the naive LCG approach. Numpy has less maintenance overhead and "cheats" less since available for others :)
@Dreamsorcerer help me please to decide, should I just close this PR because it out of framework scope or we can go further with one of these approaches, fastrand, numpy or custom lcg?
def method1():
return sample(range(1, 10001), num_queries)
from fastrand import pcg32bounded
def method2():
generated_ids = {pcg32bounded(10000) for _ in range(num_queries)}
while len(generated_ids) < num_queries:
id_ = pcg32bounded(10000)
if id_ not in generated_ids:
generated_ids.add(id_)
return list(generated_ids)
from app.random_utils import random_unique_ids as random_unique_ids_cython
def method3():
return random_unique_ids_cython(num_queries)
import numpy as np
p = np.random.permutation(np.arange(1, 10001, dtype=np.uint16))
i = 0
def method4():
global i, p
if i + num_queries > len(p):
i = 0
np.random.shuffle(p)
result = p[i:i + num_queries].tolist()
i += num_queries
return result
Probably for @msmith-techempower to decide. I'd probably expect something like this to be part of the Python benchmark tooling and used by all implementations, thus bringing up Python globally.
I am going to close this, since change is controversial and not related to aiohttp framework.