Unexpectedly treated base64 value as a typo
After https://github.com/crate-ci/typos/issues/411 our project was successfully scanned (thank you!), though now it picked up the following:
error: `BA` should be `BY`, `BE`
--> ./content/appendices.md:1197:110
|
1197 | "ed25519:1": "Wm+VzmOUOz08Ds+0NTWb1d4CZrVsJSikkeRxh6aCcUwu6pNC78FunoD7KNWzqFn241eYHYMGCA5McEiVPdhzBA"
| ^^
|
This is a base64-encoded string contained in a wider example, so shouldn't be included.
We have base64 detection but to avoid us mistaking identifiers as base64, we only allow it for 90+ characters (trying to be considerate to Java users).
We can possibly make the check smarter by checking if any non-identifier characters were present.
I've released v1.3.9 with some improvement here (detecting some base64 values shorter than 90 characters if they have a + or / in them).
The other problem is this base64 value doesn't have padding. Relying on padding is my other way to avoid ignoring values we shouldn't. Granted, when I implemented it, I didn't know how common padding is or isn't.
As someone dealing with base64 values, any thoughts?
We're mostly working in our own spec of unpadded base64: https://spec.matrix.org/v1.1/appendices/#unpadded-base64
It's roughly close enough to most (un)padded implementations of base64 to be parsed by existing parsers just fine, though that obviously doesn't really help with the spellcheck side of things.
My world is almost entirely unpadded, but I'm not sure that's a representative sample. We also work in URL-safe unpadded base64, for added fun: https://spec.matrix.org/v1.1/rooms/v4/#event-ids
FYI #695 provides a new workaround for false positives