pycoin icon indicating copy to clipboard operation
pycoin copied to clipboard

[Bounty] Faster address generation

Open blockonomics opened this issue 3 years ago • 14 comments

Currently we are getting average 0.000709s . Our code is on master branch commit 6a8b7c. Is there anyway to further speed this up?

We need to generate thousand addresses in around 0.001-0.01s. Happy to sponsor the work required for this in BTC

import os
from pycoin.symbols.btc import network as BTC
os.environ["PYCOIN_NATIVE"]="openssl"
import time
key = BTC.parse("xpub661MyMwAqRbcFVF9ULcqLdsEa5WnCCugQAcgNd9iEMQ31tgH6u4"
                "DLQWoQayvtSVYFvXz2vPPpbXE1qpjoUFidhjFj82pVShWu9curWmb2zy")
node1=key.subkey(0)
start_time = time.time()
count = 1000
for i in range(0, count):
  node1 = key.subkey(i)
print("Average time for an address {}".format((time.time()-start_time)/count))  

blockonomics avatar Mar 01 '21 11:03 blockonomics

Also try PYCOIN_NATIVE=secp256k1 and install the secp256k1 library. You can use PYCOIN_LIBSECP256K1_PATH to give a hint to the path. It's likely this will be faster than OpenSSL. Give it a try and see how that goes.

richardkiss avatar Mar 07 '21 06:03 richardkiss

Yes it is 50% faster Average time for an address 0.000350296974182 . Thanks ! Anyway to achieve further speed. Not sure what are the mathematical limits


import os
os.environ["PYCOIN_LIBSECP256K1_PATH"] = "/usr/local/lib/libsecp256k1.so"
os.environ["PYCOIN_NATIVE"]="secp256k1"
from pycoin.symbols.btc import network as BTC
import time
start_time = time.time()
key = BTC.parse("xpub661MyMwAqRbcFVF9ULcqLdsEa5WnCCugQAcgNd9iEMQ31tgH6u4"
                "DLQWoQayvtSVYFvXz2vPPpbXE1qpjoUFidhjFj82pVShWu9curWmb2zy")
node1=key.subkey(0)
count = 1000
for i in range(0, count):
  node1 = key.subkey(i).address()
print("Average time for an address {}".format((time.time()-start_time)/count))  

blockonomics avatar Mar 07 '21 17:03 blockonomics

The main operations in subkey are hashing and ECC point operations, which are fast native code already in this configuration. Converting to base58 is python, so I suspect that's the slow part here. Try removing .address() to see how this affects the generation rate. I tried, and it's about 25% faster.

Not sure what your application is here, but it might suffice for you to operate & on or store public key hashes, which are 20 bytes blobs, and only convert to an address when it's time to show them to a human. If you're doing vanity key generation, you could convert the vanity key limits from base58 to hashes and simply ensure that the hash is in that range.

richardkiss avatar Mar 08 '21 20:03 richardkiss

The main operations in subkey are hashing and ECC point operations, which are fast native code already in this configuration. Converting to base58 is python, so I suspect that's the slow part here. Try removing .address() to see how this affects the generation rate. I tried, and it's about 25% faster.

Not sure what your application is here, but it might suffice for you to operate & on or store public key hashes, which are 20 bytes blobs, and only convert to an address when it's time to show them to a human. If you're doing vanity key generation, you could convert the vanity key limits from base58 to hashes and simply ensure that the hash is in that range.

Thanks our main function is to display balance / history of xpub like this so we need the bitcoin address. I was wondering if there is any c extension of python/hardware accelerated library that does faster base58 encoding

shivaenigma avatar Apr 03 '21 07:04 shivaenigma

@shivaenigma are you still interested in this?

richardkiss avatar Nov 27 '22 22:11 richardkiss

Yes of course ! It is possible to achieve a further speed this up ? Feel free to also mail me

shivaenigma avatar Nov 28 '22 11:11 shivaenigma

I hacked a bit on b58 encoding and decoding here https://github.com/richardkiss/pycoin/tree/b58-optimization and this fairly small change is a speed-up of about 40%.

In the past couple years, I've become expert at writing python extensions in rust thanks to pyo3 so I have a small prototype that speeds up b58 encoding up by 5-10x here https://github.com/richardkiss/pycoin_rs

It looks like you are doing key derivation as well, so there are quite a few steps in the pipe that could be improved. Have you done any profiling to see what the slowest part is?

richardkiss avatar Dec 07 '22 06:12 richardkiss

Take a look at

https://github.com/richardkiss/pycoin_rs/blob/main/tests/test_bip32_derivation.py

for an example on how to use the new wheel pycoin_rs to derive lots of addresses quickly. It even uses multiple cores (releasing the GIL) if more than 32 paths are included.

Lots of rough edges still, including no error handling (just panics, so you can't really deploy in production without doing careful pre-screening of parameters), but I'd love to hear feedback. I can get binary wheels on pypi fairly easily.

Richard

On Mon, Nov 28, 2022 at 3:07 AM shivaenigma @.***> wrote:

Yes of course ! It is possible to achieve a further speed this up ? Feel free to also mail me

— Reply to this email directly, view it on GitHub https://github.com/richardkiss/pycoin/issues/377#issuecomment-1328894376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABJWHD7MWFTLGJWCEJIRLWKSG5DANCNFSM4YMMQWSQ . You are receiving this because you commented.Message ID: @.***>

richardkiss avatar Jan 02 '23 03:01 richardkiss

Hi, richard.

I was following this thread, and try to derivate keys and addresses. I forked your repo, and then merged master into b58-optimization to solve bip_as_sting issue. then run b58 branch.

I got the same amount of time compare to master branch, am I missing something? master branch: Average time for an address 0.0011854101181 seconds b58-optimization: Average time for an address 0.00117835948467 seconds

and using secp256k1 as pycoin native: master branch: Average time for an address 0.000384283018112 seconds b58-optimization: Average time for an address 0.000387255096436 seconds

my test script was the same with shivaenigma as below:

import os
os.environ["PYCOIN_LIBSECP256K1_PATH"] = "/usr/local/lib/libsecp256k1.so"
os.environ["PYCOIN_NATIVE"]="secp256k1"
from pycoin.symbols.btc import network as BTC
import time
start_time = time.time()
key = BTC.parse("xpub661MyMwAqRbcFVF9ULcqLdsEa5WnCCugQAcgNd9iEMQ31tgH6u4"
                "DLQWoQayvtSVYFvXz2vPPpbXE1qpjoUFidhjFj82pVShWu9curWmb2zy")
node1=key.subkey(0)
count = 1000
for i in range(0, count):
  node1 = key.subkey(i).address()
print("Average time for an address {}".format((time.time()-start_time)/count))

ErfanMN avatar Jan 17 '23 16:01 ErfanMN

The change I made to the b58-optimization branch is pretty minimal compared to the rust-based speed-ups. In particular, the python optimizations only speeds up the b58 encoding step, which isn't particularly significant.

Try the api in the new clvm_rs wheel. It uses rust to generate a set of addresses. Do the following in a fresh virtualenv:

# need rust tools installed, see https://www.rust-lang.org/tools/install
$ pip install git+https://github.com/richardkiss/pycoin_rs.git

# pip takes a long time as it's building from source

$ wget https://raw.githubusercontent.com/richardkiss/pycoin_rs/main/tests/test_bip32_derivation.py
$ python -c 'import test_bip32_derivation; test_bip32_derivation.test_bip32_derivation()'

On my MacBook Air M2, it generates 100000 children in 0.85 s, and does the b58-encoding in another 0.05 s. Being able to use multithreading when python can't is a huge win.

richardkiss avatar Jan 18 '23 03:01 richardkiss

well done, I checked and compared pycoin_rs with pycoin master branch and it boosts the speed 15x faster. I modified the script as below:

import time
from pycoin_rs import public_keys_for_xpub_str_and_paths, MAINNET
import os

os.environ["PYCOIN_LIBSECP256K1_PATH"] = "/usr/local/lib/libsecp256k1.so"
os.environ["PYCOIN_NATIVE"]="secp256k1"
from pycoin.symbols.btc import network as BTC

def test_bip32_derivation():
    XPUB = (
        "xpub661MyMwAqRbcFLDp2XRhusibkNn3gGtTS85mTCLMFzeuSxvsTKe"
        "vyS8c4SgHUCfoAwSWc6zY4TgSVAG9NGpfwb78yNZkcUxmavdRYCJqhLH"
    )
    start = time.time()
    t = public_keys_for_xpub_str_and_paths(XPUB, [f"m/0/{n}" for n in range(100000)])
    addresses = [_.p2pkh_address(MAINNET) for _ in t]
    end = time.time()
    elapsed = end - start
    print(f"took {elapsed}")


    start = time.time()
    key = BTC.parse(XPUB)
    addresses_old = []
    for i in range(100000):
        addresses_old.append(key.subkey(0).subkey(i).address())
    end = time.time()
    elapsed = end - start
    print(f"took {elapsed}")

    # `ku P:1 -s 0 -a`
    assert str(addresses[0]) == "1JuGYKYuegNQaWVwTbSwJxqiUaineBKxeh"
    assert str(addresses_old[0]) == "1JuGYKYuegNQaWVwTbSwJxqiUaineBKxeh"

    # `ku P:1 -s 99999 -a`
    assert str(addresses[99999]) == "1Mrq4Dztrshzrck6fDmWfNWW44YmzNeQV7"
    assert str(addresses_old[99999]) == "1Mrq4Dztrshzrck6fDmWfNWW44YmzNeQV7"

The Result: pycoin_rs --> took 1.3610923290252686 seconds pycoin master branch --> 21.53228998184204 seconds

But, it seems doesn't support python 2.7 as maturin package requires python > 3.7

ErfanMN avatar Jan 20 '23 10:01 ErfanMN

Yes, it will never support 2.7. In fact, I'm considering dropping py2 support for pycoin too.

richardkiss avatar Jan 27 '23 20:01 richardkiss

Thank you Richard. I know rust is a fast programming language in performance, but I am just curious isn't C++ faster than rust? and what was your motivation choosing rust over C++?

ErfanMN avatar Feb 01 '23 12:02 ErfanMN

I've disliked C++ for quite some time now, and had fallen behind in following the language. I was not able to read modern idiomatic C++ very easily. A couple years ago, I started learning rust to port clvm from python to rust here https://github.com/Chia-Network/clvm_rs and my admiration for the language has only grown since then. It includes a build and module system, which are two things that prevent the C++ ecosystem from being usable.

richardkiss avatar Feb 04 '23 19:02 richardkiss