base58 icon indicating copy to clipboard operation
base58 copied to clipboard

Performance is non-linear, thus. very bad as input gets large

Open james-antill opened this issue 4 years ago • 5 comments

% dd if=/dev/random of=abcd10 bs=1 count=10000
10000+0 records in
10000+0 records out
10000 bytes transferred in 0.051649 secs (193615 bytes/sec)
% dd if=/dev/random of=abcd100 bs=1 count=100000
100000+0 records in
100000+0 records out
100000 bytes transferred in 0.423879 secs (235916 bytes/sec)
% dd if=/dev/random of=abcd1000 bs=1 count=1000000
1000000+0 records in
1000000+0 records out
1000000 bytes transferred in 4.197053 secs (238262 bytes/sec)
% time ./base58 -i abcd10 | wc -c
   13658
./base58 -i abcd10  0.12s user 0.01s system 93% cpu 0.132 total
% time ./base58 -i abcd100 | wc -c
  136567
./base58 -i abcd100  11.44s user 0.17s system 95% cpu 12.184 total
% time ./base58 -i abcd1000 | wc -c
 1365659
./base58 -i abcd1000  1298.50s user 20.66s system 93% cpu 23:33.18 total

james-antill avatar Jun 20 '20 22:06 james-antill

Note that this isn't just you, "github.com/btcsuite/btcutil/base58" gives:

% time ./base58-2 -i abcd10 | wc -c
   13658
./base58-2 -i abcd10  0.27s user 0.01s system 96% cpu 0.288 total
% time ./base58-2 -i abcd100 | wc -c
  136567
./base58-2 -i abcd100  26.02s user 0.37s system 95% cpu 27.780 total

it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.

james-antill avatar Jun 20 '20 22:06 james-antill

it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.

Base64/32/16/8 use a radically different algorithm - they map every even-sized chunk of input to a corresponding set of outputs, thus "streaming" the result a "chunk" at a time.

Base58/base36/and any other leading-zero-integer-based algorithm reads the entire input, treats it as a single very large integer, converts its representation base, and then spits it out.

While there is a number of ways to speed this conversion up, it is bound to remain non-linear, as the larger the input: the more work to convert it to an internal integer representation, and the longer it is to swap its base out.

ribasushi avatar Jun 20 '20 22:06 ribasushi

Yeh, it's probably worth pointing that out somewhere obvious. I ran across it when I heard about multihash/ipfs and the information about it mostly suggested it as a better replacement for base64 for humans ... wikipedia does mention it converts to large numbers, but I assumed that's because the other implementations used bignum and it could be implemented using groups of 29 bytes.

james-antill avatar Jun 20 '20 22:06 james-antill

Nope, this really converts to "bignum". In IPFS all the hashes are extremely short ( 512bits at most + some padding for multiformats ) so this doesn't become a factor.

For actual binary encoding into ascii with minimal amount of "bloat" - look into https://en.wikipedia.org/wiki/Ascii85#Example_for_Ascii85

ribasushi avatar Jun 20 '20 22:06 ribasushi

I ended up writing a base50 API, which had the features I wanted from base58 ... probably overkill. Feel free to close.

james-antill avatar Jun 28 '20 22:06 james-antill