FrameworkBenchmarks [aiohttp] - use lcg as random generator

We currently make a large number of randint calls, particularly within the update method — for example, each call to update may trigger random * 2 * number of queries.

The existing Python implementation uses the Mersenne Twister, which is a high-quality RNG with 53-bit precision and a very long period (2**19937−1). However, for our simplified use case, a classic LCG (Linear Congruential Generator) is more than sufficient and significantly faster, especially given the number of calls we make.

Benchmark code

from random import sample, randint, randrange, choices
import timeit

# Define number of queries
num_queries = 20
# Benchmark the two approaches
def method1():
    return sample(range(1, 10001), num_queries)

def method2():
    return random_unique_ids(num_queries)

from app.random_utils import random_unique_ids as random_unique_ids_cython
def method3():
    return random_unique_ids_cython(num_queries)

# Run benchmark
iterations = 100000
time1 = timeit.timeit(method1, number=iterations)
time2 = timeit.timeit(method2, number=iterations)
time3 = timeit.timeit(method3, number=iterations)

print(f"sample: {time1:.6f} seconds")
print(f"lcg pure python: {time2:.6f} seconds")
print(f"lcg cython: {time3:.6f} seconds")
print(f"Difference: {abs(time1-time3):.6f} seconds")
print(f"lcg cython is faster by {(max(time1,time3)/min(time1,time3)-1)*100:.2f}%")

Results

sample: 0.749489 seconds
lcg pure python: 0.379528 seconds
lcg cython: 0.039777 seconds
Difference: 0.709712 seconds
lcg cython is faster by 1784.24%

Updates

Before

 Queries: 10 for update
 wrk -H 'Host: tfb-server' -H 'Accept: application/json,text/html;q=0.9,application/xhtml+xml;q=0.9,application/xml;q=0.8,*/*;q=0.7' -H 'Connection: keep-alive' --latency -d 30 -c 32 --timeout 8 -t 6 "http://tfb-server:8080/updates/10"
---------------------------------------------------------
Running 30s test @ http://tfb-server:8080/updates/10
  6 threads and 32 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.63ms    1.88ms  76.45ms   96.61%
    Req/Sec     3.41k   844.18     9.66k    72.68%
  Latency Distribution
     50%    1.40ms
     75%    1.90ms
     90%    2.48ms
     99%    6.93ms
  611203 requests in 30.10s, 270.92MB read
Requests/sec:  20305.71
Transfer/sec:      9.00MB

After

Running 30s test @ http://tfb-server:8080/updates/10
  6 threads and 32 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.38ms  848.87us  31.48ms   84.04%
    Req/Sec     3.74k   664.81    11.00k    69.42%
  Latency Distribution
     50%    1.17ms
     75%    1.78ms
     90%    2.33ms
     99%    3.77ms
  669708 requests in 30.10s, 296.85MB read
Requests/sec:  22249.46
Transfer/sec:      9.86MB
STARTTIME 1749525574

Jun 10 '25 03:06 Reskov

@Dreamsorcerer please take a look 🙏

Jun 10 '25 03:06 Reskov

Found old but interesting article with the main conclusion is that the built-in random.randint() function should be avoided in performance-critical code.

Also tested third-party libraries fastrand and numpy

sample: 0.538499 seconds
fastrand pure python: 0.167712 seconds
LCG cython: 0.031578 seconds
numpy: 0.070478 seconds

Numpy does a good job using array shuffle. It's still twice slower than the naive LCG approach. Numpy has less maintenance overhead and "cheats" less since available for others :)

@Dreamsorcerer help me please to decide, should I just close this PR because it out of framework scope or we can go further with one of these approaches, fastrand, numpy or custom lcg?

def method1():
    return sample(range(1, 10001), num_queries)
from fastrand import pcg32bounded

def method2():
    generated_ids = {pcg32bounded(10000) for _ in range(num_queries)}

    while len(generated_ids) < num_queries:
        id_ = pcg32bounded(10000)
        if id_ not in generated_ids:
            generated_ids.add(id_)

    return list(generated_ids)


from app.random_utils import random_unique_ids as random_unique_ids_cython
def method3():
    return random_unique_ids_cython(num_queries)

import numpy as np


p = np.random.permutation(np.arange(1, 10001, dtype=np.uint16))
i = 0
def method4():
    global i, p

    if i + num_queries > len(p):
        i = 0
        np.random.shuffle(p)

    result = p[i:i + num_queries].tolist()
    i += num_queries
    return result

Jun 16 '25 12:06 Reskov

Probably for @msmith-techempower to decide. I'd probably expect something like this to be part of the Python benchmark tooling and used by all implementations, thus bringing up Python globally.

Jun 16 '25 14:06 Dreamsorcerer

I am going to close this, since change is controversial and not related to aiohttp framework.

Oct 17 '25 06:10 Reskov