base58
base58 copied to clipboard
Performance is non-linear, thus. very bad as input gets large
% dd if=/dev/random of=abcd10 bs=1 count=10000
10000+0 records in
10000+0 records out
10000 bytes transferred in 0.051649 secs (193615 bytes/sec)
% dd if=/dev/random of=abcd100 bs=1 count=100000
100000+0 records in
100000+0 records out
100000 bytes transferred in 0.423879 secs (235916 bytes/sec)
% dd if=/dev/random of=abcd1000 bs=1 count=1000000
1000000+0 records in
1000000+0 records out
1000000 bytes transferred in 4.197053 secs (238262 bytes/sec)
% time ./base58 -i abcd10 | wc -c
13658
./base58 -i abcd10 0.12s user 0.01s system 93% cpu 0.132 total
% time ./base58 -i abcd100 | wc -c
136567
./base58 -i abcd100 11.44s user 0.17s system 95% cpu 12.184 total
% time ./base58 -i abcd1000 | wc -c
1365659
./base58 -i abcd1000 1298.50s user 20.66s system 93% cpu 23:33.18 total
Note that this isn't just you, "github.com/btcsuite/btcutil/base58" gives:
% time ./base58-2 -i abcd10 | wc -c
13658
./base58-2 -i abcd10 0.27s user 0.01s system 96% cpu 0.288 total
% time ./base58-2 -i abcd100 | wc -c
136567
./base58-2 -i abcd100 26.02s user 0.37s system 95% cpu 27.780 total
it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.
it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.
Base64/32/16/8 use a radically different algorithm - they map every even-sized chunk of input to a corresponding set of outputs, thus "streaming" the result a "chunk" at a time.
Base58/base36/and any other leading-zero-integer-based algorithm reads the entire input, treats it as a single very large integer, converts its representation base, and then spits it out.
While there is a number of ways to speed this conversion up, it is bound to remain non-linear, as the larger the input: the more work to convert it to an internal integer representation, and the longer it is to swap its base out.
Yeh, it's probably worth pointing that out somewhere obvious. I ran across it when I heard about multihash/ipfs and the information about it mostly suggested it as a better replacement for base64 for humans ... wikipedia does mention it converts to large numbers, but I assumed that's because the other implementations used bignum and it could be implemented using groups of 29 bytes.
Nope, this really converts to "bignum". In IPFS all the hashes are extremely short ( 512bits at most + some padding for multiformats ) so this doesn't become a factor.
For actual binary encoding into ascii with minimal amount of "bloat" - look into https://en.wikipedia.org/wiki/Ascii85#Example_for_Ascii85
I ended up writing a base50 API, which had the features I wanted from base58 ... probably overkill. Feel free to close.