llm icon indicating copy to clipboard operation
llm copied to clipboard

Monotonic ULIDs to fix the tests

Open simonw opened this issue 6 months ago • 4 comments

I keep running into tests that fail in weird ways because ULIDs in the same ms don't sort reliably.

It turns out the ULID spec covers this: https://github.com/ulid/spec/blob/master/README.md#monotonicity

But the Python library I am using does not (yet) have it:

  • https://github.com/mdomke/python-ulid/issues/1

simonw avatar May 27 '25 02:05 simonw

This other Python library does support this, but doesn't have as much development activity or usage (stars etc). https://github.com/itsrainingmani/py-ulid/blob/main/README.md#monotonic-ulids

simonw avatar May 27 '25 02:05 simonw

Trying this out...

llm \
  -f https://raw.githubusercontent.com/mdomke/python-ulid/refs/heads/main/ulid/__init__.py \
  -f https://raw.githubusercontent.com/ulid/spec/refs/heads/master/README.md \
  -s 'Implement a subclass of ULID that adds monotonic generation by persisting state as to the last one it created on the class itself' \
  -m o3

https://gist.github.com/simonw/5a20bf58a977d6c96747cbe13a38aa0f

That code might just work!

simonw avatar May 27 '25 02:05 simonw

OK, I knocked it into shape and I think this is good:

NANOSECS_IN_MILLISECS = 1000000
TIMESTAMP_LEN = 6
RANDOMNESS_LEN = 10

_lock: Final = threading.Lock()
_last: bytes | None = None  # 16-byte last produced ULID


def monotonic_ulid() -> ULID:
    """
    Return a ULID instance that is guaranteed to be *strictly larger* than every
    other ULID returned by this function inside the same process.

    It works the same way the reference JavaScript `monotonicFactory` does:
    * If the current call happens in the same millisecond as the previous
        one, the 80-bit randomness part is incremented by exactly one.
    * As soon as the system clock moves forward, a brand-new ULID with
        cryptographically secure randomness is generated.
    * If more than 2**80 ULIDs are requested within a single millisecond
        an `OverflowError` is raised (practically impossible).
    """
    global _last

    now_ms = time.time_ns() // NANOSECS_IN_MILLISECS

    with _lock:
        # First call
        if _last is None:
            _last = _fresh(now_ms)
            return ULID(_last)

        # Decode timestamp from the last ULID we handed out
        last_ms = int.from_bytes(_last[:TIMESTAMP_LEN], "big")

        # If the millisecond is the same, increment the randomness
        if now_ms == last_ms:
            rand_int = int.from_bytes(_last[TIMESTAMP_LEN:], "big") + 1
            if rand_int >= 1 << (RANDOMNESS_LEN * 8):
                raise OverflowError(
                    "Randomness overflow: > 2**80 ULIDs requested "
                    "in one millisecond!"
                )
            randomness = rand_int.to_bytes(RANDOMNESS_LEN, "big")
            _last = _last[:TIMESTAMP_LEN] + randomness
            return ULID(_last)

        # New millisecond, start fresh
        _last = _fresh(now_ms)
        return ULID(_last)


def _fresh(ms: int) -> bytes:
    """Build a brand-new 16-byte ULID for the given millisecond."""
    timestamp = int.to_bytes(ms, TIMESTAMP_LEN, "big")
    randomness = os.urandom(RANDOMNESS_LEN)
    return timestamp + randomness

simonw avatar May 27 '25 02:05 simonw

This test fails with ulid.ULID() but passes with monotonic_ulid():

def test_monotonic_ulids():
    ulids = [monotonic_ulid() for i in range(1000)]
    assert ulids == sorted(ulids)

simonw avatar May 27 '25 02:05 simonw