workers-oauth-provider icon indicating copy to clipboard operation
workers-oauth-provider copied to clipboard

Biased output from generateRandomString()

Open neilmadden-teya opened this issue 5 months ago • 4 comments

The getRandomString() implementation produces biased output. The characters A-H are more likely to be produced than any others (about 1.5% more likely). This is due to the use of % characters.length to reduce each byte to a value in the alphabet, which introduces bias. You should either use rejection sampling or extra random bits (at least 64 extra random bits) to reduce the bias.

This bug reduces the effective entropy of all of the tokens generated by this library, although unlikely to a level that poses a real security threat (hence why disclosing here and not via the bug bounty).

neilmadden-teya avatar Jun 06 '25 10:06 neilmadden-teya

It is probably easier to just base64url-encode the random bytes directly, which is what a lot of other OAuth libraries do.

neilmadden-teya avatar Jun 06 '25 10:06 neilmadden-teya

It is probably easier to just base64url-encode

or add -_ to the alphabet to make its length a power of 2.

AlexanderYastrebov avatar Jun 07 '25 09:06 AlexanderYastrebov

For reference, the line in question:

https://github.com/cloudflare/workers-oauth-provider/blob/a6e3e06c2642e0fd4c185374753201cffc21ce8a/src/oauth-provider.ts#L1953

Arguably, this is the consequence of the interface of Crypto being too minimal, which makes it too easy for developers (or AIs, in this case) to shoot themselves in the foot.

Same bug in other libraries:

  • https://github.com/remotestorage/remotestorage.js/blob/3135cdfe5a3599a2bffd35deb7a90ca1dfee2dff/src/util.ts#L309
  • https://github.com/toeverything/AFFiNE/blob/115496aa8ed42eedeaac87bcbdd9f04d9ea97b6c/packages/common/infra/src/utils/fractional-indexing.ts#L27
  • https://github.com/parse-community/parse-server/blob/4626d35aa6fadd6a66386ce19e7a96e5ad165aeb/src/cryptoUtils.js#L30
  • https://github.com/mattermost/mattermost/blob/c0f1cbf7271ff59795316c62006cfe6ad39c4b73/webapp/channels/src/components/admin_console/secure_connections/modals/modal_utils.tsx#L31
  • https://github.com/expo/expo/blob/be9033b60c2a603e4de0de16bcd60d2c321b8af3/packages/expo-web-browser/build/ExpoWebBrowser.web.js#L232
  • https://github.com/amplication/amplication/blob/95775f94b7a77d59052e0467d1d3d640f15bb5d2/packages/amplication-server/src/core/auth/auth-utils.ts#L35
  • https://github.com/github/codeql/blob/ef5e605cc479c03bcdd68cd470f63864b14918ad/ruby/ql/src/experimental/insecure-randomness/examples/InsecureRandomnessGood.rb#L8

There are probably many more cases. I used a very simple regex and stopped after a few pages.

99991 avatar Jun 08 '25 13:06 99991

I did actually notice this issue when reviewing the code.

However, as noted, the amount of entropy is still plenty sufficient for security. The token is not used directly as any kind of crypto key, where bias could impact the cryptanalysis of the algorithm. It is only used as input to a hash function, which should adequately take all the entropy in the token and redistribute it.

Yes, we could get very very slightly more entropy per character by fixing this. Though, simply adding two characters to the characters string would require thinking about which two characters would be safe in all contexts where these tokens are used. Alternatively we could have discarded and regenerated any bytes that happened to be in the range [248,256) but that's annoying to implement.

So my thought process was: "Eh this is fine for now and we can always change it later if we want." So I moved on. In retrospect, I obviously wish I hadn't, given the attention this is now getting...

kentonv avatar Jun 08 '25 16:06 kentonv