spec icon indicating copy to clipboard operation
spec copied to clipboard

Question: regexp

Open vanodevium opened this issue 2 years ago • 3 comments

Please provide correct regex for ULID. Thanx!

vanodevium avatar Nov 30 '23 16:11 vanodevium

Try this case-insensitive regex:

^[0-7][0123456789ABCDEFGHJKMNPQRSTVWXYZabcdefghjkmnpqrstvwxyz]{25}$
^[0-7][0-9A-HJKMNP-TV-Za-hjkmnp-tv-z]{25}$

or this all-uppercase regex:

^[0-7][0123456789ABCDEFGHJKMNPQRSTVWXYZ]{25}$
^[0-7][0-9A-HJKMNP-TV-Z]{25}$

or this all-lowercase regex:

^[0-7][0123456789abcdefghjkmnpqrstvwxyz]{25}$
^[0-7][0-9a-hjkmnp-tv-z]{25}$

Note: the first character can only be one of these digits: 0, 1, 2, 3, 4, 5, 6, 7. That's why the regex starts with [0-7]. The remaining 25 characters are from the Crockford base-32 alphabet.

fabiolimace avatar Nov 30 '23 20:11 fabiolimace

By strict definition of the spec, I believe this is incorrect. 001oo is a valid Crockford 32 value. Both 'o' the letter and '0' the digit are collapsed into the same value, but by Crockford's spec, both are valid as input. As output, 001oo should be represented as 00100, but as input, 001oo and 00100 are both valid and the same value.

This regex is still useful for testing output, but it will fail on valid-by-spec Crockford Base32 encoded ULIDs.

Also as part of the spec, Crockford Base32 allows dashes as part of an input and ignores them, skipping over them to the next valid character. This is for visual formatting to be able to compare and read values. I can not say (I didn't check and I don't use dashes) if this is relevant to ULIDs.

EvanEdwards avatar May 01 '25 22:05 EvanEdwards

@EvanEdwards ,

This regex accepts dashes between characters and permits the letters O I L o i l:

^[0-7OILoil](-?[0123456789ABCDEFGHJKMNPQRSTVWXYZabcdefghjkmnpqrstvwxyzOILoil]){25}$

fabiolimace avatar May 02 '25 03:05 fabiolimace