electrum
electrum copied to clipboard
[500$ Bounty] Slow disk writes
We’re using electrum to generate addresses ondemand for thousands of customers, we’re a SaaS eCommerce platform (https://sellix.io) and provide our own infrastructure for cryptocurrencies.
We already have an address reusability system (the same address is re-used multiple times when possible), however, our electrum wallet currently counts over 35.000 addresses today. Generating a new one takes as much as 20 seconds, whilst on electrum-ltc and electron-bch less than a second, with the same amount of addresses on the wallet.
We’d like a hand figuring it out and solving it ASAP. Thank you.
Generating a new one takes as much as 20 seconds
How specifically are you generating a new address? What is it that you have timed to take that long, is it the createnewaddress
RPC command?
yes, exactly. We've an electrum daemon and both jsonrpc (http) and electrum client (i guess it uses jsonrpc too) takes 20s.
electrum client (i guess it uses jsonrpc too)
yes, the CLI uses jsonrpc too.
Have you increased the gap limit for this wallet? If so, what value is it set to?
Never increased the gap limit.
Generating a new one takes as much as 20 seconds, whilst on electrum-ltc and electron-bch less than a second, with the same amount of addresses on the wallet.
electrum-ltc follows us pretty closely. What version of it are you using?
Please enable debug logging, and grep for lines starting with D | util.profiler | WalletDB._write
. The number that follows is the time taken in seconds by the call. How long does that take?
20220609T180804.215979Z | DEBUG | util.profiler | WalletDB._write 24.0569
Right... so the root cause seems to be the db writes being slow. This is unfortunately an architectural problem that is hard to fix. The wallet db is backed by a (potentially encrypted) json file. As it is json, if you want any change persisted, the whole file has to be rewritten to disk. For large wallets, this is unsurprisingly very slow. See https://github.com/spesmilo/electrum/issues/4823
As to why the BCH and LTC forks don't exhibit the behaviour... they should in theory suffer from the same fundamental issue, so I am unsure. One thing that comes to mind is that we added an extra write-to-disk call to wallet.set_up_to_date()
around version 4.0. This gets called every time the wallet finishes syncing, which in your case gets triggered soon after every createnewaddress
call.
https://github.com/spesmilo/electrum/blob/839db6ee9c696a9cc5157bf225e750a124c4cdbb/electrum/wallet.py#L382-L384
Previously we used to not do this, and would only persist the wallet file when the wallet is closed gracefully - this could mean losing state, although if it's only HD addresses they would likely be regenerated next time (+gap limit shenanigans).
So with that in mind, I guess you could experiment with removing that line (i.e. the call to self.save_db()
inside wallet.set_up_to_date
)
However, if you are using a recent version of Electrum-LTC, they should have the same code, in which case I don't know. So again, please state exact version.
The largest wallet I have to test with has 250k addresses, with a file size of ~531 MB. WalletDB._write
takes ~26 seconds with that.
>>> len(wallet.get_addresses())
250657
>>> len(wallet.db.transactions)
140985
You said yours has 35k addresses and takes ~24 seconds, which is weird as I would expect linear scaling. How large is your wallet file on disk?
btw, do you have wallet file encryption enabled? If you use the CLI, this is the encrypt_file
option for the password
command.
That has a huge effect on the wallet file size -- although maybe not so much on the db write time.
My numbers are for an encrypted wallet file.
So with that in mind, I guess you could experiment with removing that line (i.e. the call to
self.save_db()
insidewallet.set_up_to_date
)
Another trick you could do, is open the wallet with --offline
, generate a few thousand addresses, and then close it, and reopen it normally. When you are offline, set_up_to_date
is not used, as it is meaningless, so this process would only result in a single db write, when the wallet is closed.
what about doing async the wallet save function? I think that without saving wallet file we would lose incoming transaction etc... Moreover, we can't generate addresses offline because electrum's daemon starts online automatically to receive txs etc etc.
I think that without saving wallet file we would lose incoming transaction etc...
Barring gap limit issues, on-chain state cannot really be lost. (as you would just resync the same state from the electrum server the next time)
Moreover, we can't generate addresses offline because electrum's daemon starts online automatically to receive txs etc etc.
You can start the daemon in offline mode (which is not well supported, the flag mainly exists for the GUI and for no-daemon CLI commands), as follows:
$ ./run_electrum --testnet daemon --offline -v
$ ./run_electrum --testnet load_wallet -w ~/.electrum/testnet/wallets/9dk
$ ./run_electrum --testnet createnewaddress -w ~/.electrum/testnet/wallets/9dk
I've just tested and this works. You can batch address-generation this way.
what about doing async the wallet save function?
hmm.. I think that might make some things much harder to reason about. :/
we use createnewaddress as a jsonrpc call... can we use two daemon which uses the same wallet (online\offline) without meet any issue?
we use createnewaddress as a jsonrpc call...
Just because my example is not using jsonrpc, do not presume it would not work like that :P It should.
can we use two daemon which uses the same wallet (online\offline) without meet any issue?
do not open the same wallet file in multiple processes simultaneously. but it is safe to have the same logical wallet (seed/xpub/etc) open in multiple processes simultaneously (each process handling a separate wallet file). so same HD keys ok, same file NOT ok.
having two daemons, one offline, one online, with two wallet files (same seed) and using the offline to generate addresses is very similar to what I've suggested with batched pre-generation of addresses. It should work.
we use extended private key importing (allows us 3 vers address generation)... so importing same private key into another electrum's daemon should work as a fix? Would the online electrum's daemon recognize the incoming transactions\inputs? Moreover, latest electrum tar.gz version (from downloads electrum website) doesn't have protobuf requirement, which was fixed in a recent commit, appended and this causes application fail.
Would the online electrum's daemon recognize the incoming transactions\inputs?
Barring gap limit issues, yes. That is, if the offline daemon is generating addresses faster than they are getting used, the online daemon will fall behind and if a new tx arrives beyond the gap limit of the online daemon that tx will not be seen. It will get discovered once the gap is rolled forward (assuming the preceding addresses become used). Not sure how much I need to explain this -- are you familiar with the gap limit concept?
Moreover, latest electrum tar.gz version (from downloads electrum website) doesn't have protobuf requirement, which was fixed in a recent commit, appended and this causes application fail.
Indeed the latest release does not have that commit. Anyway, that's a separate issue. (https://github.com/spesmilo/electrum/issues/7833)
can we use two daemon which uses the same wallet (online\offline) without meet any issue?
having two daemons, one offline, one online, with two wallet files (same seed) and using the offline to generate addresses is very similar to what I've suggested with batched pre-generation of addresses. It should work.
so importing same private key into another electrum's daemon should work as a fix? Would the online electrum's daemon recognize the incoming transactions\inputs?
Ah wait, I am wrong actually. I mean, the two-daemon approach works as a mode of operation, but it does not solve the performance issue. The online daemon would still end up generating the addresses for its own wallet file, except it would do that automatically as new transactions are discovered. Every time it did it, you would see the same slowness.
The offline address pre-generation into the same wallet file would work though.
But in fact even if you pre-generate the addresses, when a new tx arrives, momentarily the wallet sync status can become not up_to_date
, in which case after the sync is done, set_up_to_date
gets called, and the db write executes...
Basically, the issue is not address generation being slow.
How do I generate sufficiently large wallet file? Generating 100000 addresses with createnewaddress
yelds in ~9 MB file which is handled quite fast by the client.
In other words, how to reproduce this bug?
We've a large wallet file with tons of txs apart the generated addresses. (currently it's 3gb)
How do I generate sufficiently large wallet file?
I have a testnet wallet with master pubkey:
vpub5VfkVzoT7qgd5gUKjxgGE2oMJU4zKSktusfLx2NaQCTfSeeSY3S723qXKUZZaJzaF6YaF8nwQgbMTWx54Ugkf4NZvSxdzicENHoLJh96EKg
though this wallet is not that large (Qt Console:):
>>> len(wallet.get_addresses())
10536
>>> len(wallet.db.transactions)
11012
>>> import os
>>> os.path.getsize(wallet.storage.path) / 1024**2
32.91964912414551
but you can e.g. set long labels for each tx to make it large:
>>> import os
>>> prng = electrum.coinchooser.PRNG(os.urandom(32))
>>> [wallet.set_label(txid, prng.get_bytes(50000).hex()) for txid in wallet.db.transactions.keys()]
@coval3nte, not addressing your issue directly, but out of curiosity, may I ask why you are using so many addresses? Is it for a single address per order? If so, have you considered any alternative approaches? If you have, which ones, and what were their pros/cons?
no, we rotate addresses across shops. the issue is from bunch of months ago, during this time we had time to see things in a different perspective.
- rather than address is the loading"scraping" of utxos which's slow [payto, also with specifying the utxos].
- we have noticed that before a restart, as the daemon running for days or hours, the endpoint [such as payto, listunspent] becomes slower [360s> for a jsonRPC request] as time passes. After a restart the walletdb read takes 7s max whilst the electrumx or its rust alternative 5s max.
What do you mean by "the walletdb read"? Are you comparing the time to run listunspent
in both cases?
in other words, is it faster after a restart?
yes, it's significantly faster after a restart. both listunspent and payto suffer from this.
As "walletdb read" i'm refering to WalletDB._load_transactions
(or some other function inside this class), which's ± constant time.
so the issue is in the mid between this and the rpc call to electrumx.
listunspent
and payto
are not RPC calls to electrumx
@ecdsa then there's something else which slows down the function, -vDEBUG
doesn't provide more informations apart the walletdb one and after that the resulting json...
Maybe a cluttered cache state (or no cache) of the parsed JSON wallet file? Or something that causes its full re-parse on every listunspent
?
I don't know electrum internals in depth but I can imagine that the problem isn't the cache itself being that after a restart it gets considerably better. maybe for some reason it is invalidated and never saved again and this mechanism stops with a restart?
Do you maybe have rough instructions how to reproduce? Or just a description of what you are doing to the wallet where it happens, e.g. how long it is open, which commands you are calling and how, and how many times, etc.
the endpoint [such as payto, listunspent] becomes slower [360s> for a jsonRPC request] as time passes
Do you mean to say that e.g. the payto command takes 6 minutes to complete? How long does it take right after a restart?
briefly description:
- wallet runs for aprox 1 day
- both listunspent and payto affected (no info about broadcast), wallet load transaction is 8sec, something else takes more
- the commands which I\cronjobs call are listunspent, createnewaddress, broadcast, payto and estimatefee. I don't know precisely how many times per day
- the restart decreases a lot the rpcs execution time, from minutes to seconds
- the issue happens when I want to sends funds from the wallet
I don't know precisely what does electrum [4.3.4] do under the hood when calling rpcs, but it's a shared thing of both listunspent and payto. moreover I've seen both electrumX and Fulcrum metrics, request times are good enough for not being the reasons of the issue.