base62.js icon indicating copy to clipboard operation
base62.js copied to clipboard

ISSUE: If the input string starts with 0 then that 0 is lost during encoding.

Open tejzpr opened this issue 6 years ago • 12 comments

If the input string starts with 0 then that 0 is lost during encoding. for e.g. if the input is 0zebra then b62.encode(b62.decode("0zebra")) outputs zebra b62.encode(b62.decode("0000zebra")) //zebra

tejzpr avatar Jan 25 '18 20:01 tejzpr

I don't think that's an issue, since all type of encoder/decoder works by formatting binary(base2) then transforming that binary to user preferred base number in this case 62 so '0x00000A' is written '10' instead of '0000010'

Sure can add a feature for supporting this kind of cases by providing a flag with count of zeroes 3$13659 which interpreted as additional zero in front of 13659 or by adding one safe char that transform zeroes in front to that special char

ghost avatar Jan 29 '18 22:01 ghost

@gunawanwijaya That's absolutely crazy. Such an encoder/decoder should work with arbitrary binary data, regardless of what it starts with. The length of the data matters, and must be preserved. I appreciate @tejzpr for flagging this issue. If the bug is as-described, this library should be considered completely unreliable until it is resolved.

bradisbell avatar Feb 25 '18 03:02 bradisbell

FWIW, my understanding is also that Base62 encoding means converting between numbers and strings (indeed, the README says as much - though I had a hand in tweaking docs recently). That is, we're always talking about numbers (positive integers), merely offering a different representation.

To me this means that leading zeros are not significant and thus cannot be preserved: Both "0z" and "z" are equivalent to 35, 35 is encoded as "z".

While I understand that how this might be confusing if your expectation is for a string (rather than a number) to be encoded, I don't see what could be done about that without fundamentally altering both implementation and interpretation of what Base62 means.

Thus I'm afraid I wouldn't consider this a bug.

FND avatar Apr 13 '18 09:04 FND

I agree with FND - this is merely a consequence of the default charset mapping the character 0 to the number 0, purely a coincidence. So the leading zeroes in tejzpr's example are dropped during decoding, which is exactly what I want to happen.

dcorking avatar Jun 06 '18 17:06 dcorking

If you can't handle strings, you should consider throwing an error when a string is passed in.

bradisbell avatar Jun 07 '18 03:06 bradisbell

That's a good point, actually. I'd considered input validation in v2.0, but ultimately decided against it - in part because it would result in breaking changes, which we were adamant to avoid.

One could imagine a strict mode to opt into (e.g. encode(int, { validate: true }), but that seems a bit awkward?

FND avatar Jun 07 '18 06:06 FND

If it were me, I'd increment the version to v3 and make the change. If you're breaking anyone with this, there's a reasonable chance what they're doing is already broken and they don't know it.

bradisbell avatar Jun 07 '18 15:06 bradisbell

Well, I'd still be wary of breaking changes, even with a[nother] major version: We made sure that v2.0 maintains backwards compatibility (because there's a large number of existing users), so calling it v3.0 doesn't necessarily invalidate that requirement. @andrew, what do you think?

For the record, in #70 I'd originally set out to revamp the API entirely, but ultimately concluded it wasn't worth sacrificing backwards compatibility. So starting with a blank slate is certainly appealing, I'm just not sure how it would impact existing users (e.g. because they're not locked to a particular version).

FND avatar Jun 08 '18 08:06 FND

Looking at https://libraries.io/npm/base62/usage, very few people would automatically pick up a new major version, in fact doesn't look like anyone in open source is using v2.0 yet, so a complete rewrite in 3.0 wouldn't cause any issues as far as I can see

andrew avatar Jun 08 '18 14:06 andrew

FWIW: I'd started porting #70's pure-ES6 approach while adding input validation on a branch, but it's very much incomplete and I'm currently too busy to turn it into a proper PR; will revisit when I get to it.

FND avatar Jun 21 '18 09:06 FND

I have to agree with FND and wouldn't consider this a bug either.

If I understand correctly, this library only handles number positional notation. As such, only numbers can be encoded, and as with any number representation, leading zeroes aren't preserved.

It's not like a full binary-to-text encoding process like Base64, and I think that's where the confusion is coming from.

pauldps avatar Jul 27 '18 02:07 pauldps

I agree, this should not be tagged as bug, it's a lib number representation and not for "encoding", maybe the naming of the methods was not ideal.

iskrid avatar Jan 09 '19 16:01 iskrid