spec icon indicating copy to clipboard operation
spec copied to clipboard

Proposal: Suggest uppercase

Open mv-i22 opened this issue 6 years ago • 7 comments

I just saw that there are implementations of ULID that provide uppercase only ULIDs, others (like the PHP implementation by @robinvdvleuten) provide lowercase ULIDs. The specification does not yet impose uppercase or lowercase but states that ULID is "case-insensitive". This is a great feature.

Nevertheless, I'd like to propose suggesting Uppercase ULID as "the right way". Mainly for two reasons:

  • most of the libraries provide uppercase ULIDs. The existing libraries will probably become the "standard solution" for their language (given that they are well designed, tested and documented).
  • Having different solutions in different languages could lead to problems in multi language environments. Consider a database that is accessed by more than one project with different languages across the projects. Consider Regex validation that will often be made up for the existing library and may lead to problems when other librarys or languages provide stuff.

I understand that this is debatable, as being flexible in your setup is a strength. But I also think, that having either uppercase or lowercase as the proposed (or imposed) way to implement ULIDs will help the Spec to spread because there is less potential for conflicts.

What do you think of this?

mv-i22 avatar Feb 19 '18 10:02 mv-i22

Bullet point 5 of ULID spec reads, in part: "Uses Crockford's base32". Crockford Spec reads, in part: "When decoding, upper and lower case letters are accepted, and i and l will be treated as 1 and o will be treated as 0. When encoding, only upper case letters are used."

openthc avatar Jul 18 '18 21:07 openthc

I think the spec should explicitly note that.

nelsonjchen avatar Dec 27 '18 18:12 nelsonjchen

When decoding [...] i and l will be treated as 1 and o will be treated as 0

I'm also pretty sure very few implementations will respect this. And although the spec explicitly mentions Crockford's Base32 (which allows for hyphens (-) anywhere in the string), AFAIK most implementations don't allow for these. So either we're not (100%) using Crockford's Base32 but some 'derivative' OR these things should be called out more explicitly in the spec.

I have implemented both (allowing i, l, I, L, o and O and allowing hyphens) in my .Net implementation.

RobThree avatar May 27 '19 16:05 RobThree

case-insensitive is for codec only, but not for all other case (like db pk, redis key), please specify (at least prefer which case) the case, otherwise the db may not found the "same" ULID.

BTW, I prefer lowercase, because in web env, most of time are case-insensitive, use lower case make more sense, like pg gen_random_uuid in lowercase.

wenerme avatar Oct 15 '21 16:10 wenerme

I don't see why the spec would have to define / enforce something as simple as upper/lowercase. If you have a specific usecase where you require either one, then call a .ToUpper() or strtolower() or whatever your language provides on it before inserting it or searching for a ULID. As you say, most usecases will be case-insensitive; for the cases where case matters, enforce it.

RobThree avatar Oct 15 '21 17:10 RobThree

from https://datatracker.ietf.org/doc/html/rfc4122#section-3

The hexadecimal values "a" through "f" are output as lower case characters and are case insensitive on input.

UUID specified the output case here, the underlying codec is not ulid, the output is.

Without consistency on case, we can not just call gen_now_uuid, always used like to_lower(gen_now_uuid) or to_upper(gen_now_uuid())

wenerme avatar Oct 15 '21 17:10 wenerme

UUID specified the output case here

What they do is up to them, isn't it?

the underlying codec is not ulid, the output is.

I'm not sure I understand what you mean here. You mean the underlying encoding I guess? GUID's are case-sensitive in most languages AFAIK too. To me, I don't see why we would enforce either lower or upper case; it's trivial in most cases where it matters to make the ULID upper- or lowercase. I can see that agreeing on a canonical notation would be beneficial, but the benefits are minor and next to none IMHO. So my reasoning then is to leave it up to whomever uses it and their usecase. There's no real technical reason to enforce either notation IMHO.

Without consistency on case, we can not just call gen_now_uuid, always used like to_lower(gen_now_uuid) or to_upper(gen_now_uuid())

If it really matters then why not create a wrapper/proxy/adapter/derived class that handles the upper- or lowercasing for your specific usecase? Shouldn't be more than a few lines of code in most languages.

I know, you could argue that it costs extra CPU cycles to uppercase an entire lowercase string or vice versa and is wasteful if you can just output the correct case directly. So then let's argue we choose lowercase as 'canonical form' and then still, from the cases where casing does matter, 50% will have to run it through uppercasing methods; and if we choose uppercase then the other 50% will have to do the same...

RobThree avatar Oct 15 '21 17:10 RobThree