electrumx
electrumx copied to clipboard
litecoind 0.21.2 mweb transactions not deserialized correctly (utxo not found in h table)
electrumx 1.16 running with litecoind 0.21.2 that introduces mweb upgrade breaks with following traceback
ERROR:electrumx:ElectrumX server terminated abnormally
Traceback (most recent call last):
File "/electrumx/electrumx_server", line 35, in main
asyncio.run(controller.run())
File "/usr/local/lib/python3.7/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
return future.result()
File "/electrumx/electrumx/lib/server_base.py", line 129, in run
await server_task
File "/electrumx/electrumx/lib/server_base.py", line 102, in serve
await self.serve(shutdown_event)
File "/electrumx/electrumx/server/controller.py", line 134, in serve
await group.spawn(wait_for_catchup())
File "/usr/local/lib/python3.7/site-packages/aiorpcX-0.18.7-py3.7.egg/aiorpcx/curio.py", line 255, in __aexit__
await self.join()
File "/usr/local/lib/python3.7/site-packages/aiorpcX-0.18.7-py3.7.egg/aiorpcx/curio.py", line 213, in join
raise task.exception()
File "/electrumx/electrumx/server/block_processor.py", line 702, in fetch_and_process_blocks
await group.spawn(self._process_prefetched_blocks())
File "/usr/local/lib/python3.7/site-packages/aiorpcX-0.18.7-py3.7.egg/aiorpcx/curio.py", line 255, in __aexit__
await self.join()
File "/usr/local/lib/python3.7/site-packages/aiorpcX-0.18.7-py3.7.egg/aiorpcx/curio.py", line 213, in join
raise task.exception()
File "/electrumx/electrumx/server/block_processor.py", line 663, in _process_prefetched_blocks
await self.check_and_advance_blocks(blocks)
File "/electrumx/electrumx/server/block_processor.py", line 229, in check_and_advance_blocks
await self.run_in_thread_with_lock(self.advance_blocks, blocks)
File "/electrumx/electrumx/server/block_processor.py", line 212, in run_in_thread_with_lock
return await asyncio.shield(run_in_thread_locked())
File "/electrumx/electrumx/server/block_processor.py", line 211, in run_in_thread_locked
return await run_in_thread(func, *args)
File "/usr/local/lib/python3.7/site-packages/aiorpcX-0.18.7-py3.7.egg/aiorpcx/curio.py", line 68, in run_in_thread
return await get_event_loop().run_in_executor(None, func, *args)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/electrumx/electrumx/server/block_processor.py", line 409, in advance_blocks
undo_info = self.advance_txs(block.transactions, is_unspendable)
File "/electrumx/electrumx/server/block_processor.py", line 450, in advance_txs
cache_value = spend_utxo(txin.prev_hash, txin.prev_idx)
File "/electrumx/electrumx/server/block_processor.py", line 646, in spend_utxo
raise ChainError(f'UTXO {hash_to_hex_str(tx_hash)} / {tx_idx:,d} not '
electrumx.server.block_processor.ChainError: UTXO b0a867c72f417dc0a904f708ba37fa2d94c3b25a288ea6e8fbe8cd14d8ca7119 / 0 not found in "h" table
Observed the same in a local test with testnet. On testnet the error ends with electrumx.server.block_processor.ChainError: UTXO 3a7299f5e6ee9975bdcc2d754ff5de3312d92db177b55c68753a1cdf9ce63a7c / 0 not found in "h" table
. That is the HogEx/integration transaction in testnet block 3000cc2076a568a8eb5f56a06112a57264446e2c7d2cca28cdc85d91820dfa17 (2215586), spent in the next block.
Was there some change of consensus rules for litecoin? Would be great if someone who actually uses litecoin contributed a fix.
I'm not a litecoin developer, just trying to use it, but here's what I understand about the MWEB activation: https://litecointalk.io/t/how-to-decipher-mweb-blocks-and-transactions/53449/3
- There is now extra data after the normal "canonical" block called the MWEB. This could be messing up block deserialization.
- The last transaction in the canonical block transactions can be a special "integration" transaction where the outputs' scripts are prefixed with OP_8 for the witness version, so non-standard by previous rules, if that matters.
- This integration txn also has an unusual segwit flag after marker. Instead of the usual 1 with segwit txns, it is 8, or at least has bit 3 set.
- The integration txn has a placeholder byte where the "mweb tx" would go, but there's nothing there in this tx because it's in the canonical block, not the EB.
- The block Version indicates if the above are present, and I believe the mask is 0x20000000.
I'm not familiar with electrumx code (or python for that matter), but I don't see any obvious reasons why these things would prevent this last txn from being added to the DB (and not be found on the next block).
I am trying to add litecoin to Fulcrum and I got hit by this. Basically if the 'flags' field for segWit has bit 3 set (that is, it is bitwise anded with 8
), then you have to read all the mWeb stuff (which you can ignore).. and at the end of the tx finally you get to nLockTime
. I think the issue now is that ElectrumX has the wrong idea of what the txhash
is because it's reading into the mWeb stuff (incorrectly) and it gets the wrong nLockTime. Probably a trivial fix is if you see that 'flags & 8', just skip everything after the witness data and read the last 4 bytes in the byte array as little endian to grab the correct nLockTime... (easier said than done given the bitcoin serialization format is variable-length.. LOL).
Other than the tx flag being 8 for the integration transaction (last tx of the canonical block), the transaction serialization seems to be unchanged. I can reproduce the tx hashes of these transactions without modifying the serialization. Everything in the tx is in the usual place afaict. I just have to NOT error if the flag is 8 instead of 1 as usual and it deserializes a tx with the correct hash. EDIT: I take that back, it fails to get the right hash for some of these txns
Could electrum x be failing to accept or understand the pkScript with OP_8
for witness version? Do these utxo tables need a decoded address to work or should they be fine with any script?
Note that I discovered that if you start litecoind
with -rpcserialversion=1
, it will completely omit the mweb stuff from serialization and the serialization format will be 100% compatible with what we had before!!
@chappjc yeah I think you don't get the right hash for some of the txns because nLocktime
is not being read correctly. nLocktime
definitely lives at the end of txn, as the last 4 bytes, right after all the mweb stuff for that txn: https://github.com/litecoin-project/litecoin/blob/948e6257aec15b52ef68b4e1ee9d73f7c740fae3/src/primitives/transaction.h#L329
Could electrum x be failing to accept or understand the pkScript with OP_8 for witness version? Do these utxo tables need a decoded address to work or should they be fine with any script?
Hmm.. no. The witness data is as before in the same place. ElectrumX I don't think doesn't even really use the witness data as far as I know. It just indexes on the scriptPubKey of the output. Any witness info is 100% client-side.
As far as I can tell electrum x doesn't even check the flag if it has bit 3 set or not... if you look in the lib/tx.py file in the function that deserializes segwit (which LiteCoin uses), it just proceeds to ignore what the flag is set at, and deserializes segwit data (which is ok), but then it reads the locktime
right after that (not ok for the last mweb txn since locktime
is not at that position but is later on...).
Yeah, you're right about reading the wrong lock time. I had concluded that for this hogex txn that mw stuff that comes first there would be empty (hitting tx.mweb_tx.IsNull()
and then tx.m_hogEx = true;
because that's what this transaction is). That's clearly wrong.
To get to the lock time though you'd have to know the length of this mw data I'd assume. I thought the buffer was the entirely serialized block, not just the transaction.
Good find with -rpcserialversion=1
!
ElectrumX I don't think doesn't even really use the witness data as far as I know. It just indexes on the scriptPubKey of the output. Any witness info is 100% client-side.
I'm referring to the OP_8 in the output script (witness prog version), not the witness data of the inputs. This is what's making it a new address type, if that matters to electrum. https://github.com/litecoin-project/litecoin/blob/948e6257aec15b52ef68b4e1ee9d73f7c740fae3/src/script/standard.cpp#L145-L147
"vout": [
{
"ismweb": false,
"value": 69.76394951,
"n": 0,
"scriptPubKey": {
"asm": "8 2a75ca3f750c1ebfc60cdfd31e20e3f2555258cedefba6d345f66949394b96b7",
"hex": "58202a75ca3f750c1ebfc60cdfd31e20e3f2555258cedefba6d345f66949394b96b7",
"type": "witness_mweb_hogaddr"
}
}
To get to the lock time though you'd have to know the length of this mw data I'd assume.
Yeah, and that data in the txn is somewhat complex. It's variable-length data.. and then there could be even MORE data after that! So it's not like you can fast-forward to the last 4 bytes in the block data. This is because at the end of the block there may be more mweb-related data as well, after the last transaction. (So if your deserializer throws errors if there is still data left in the buffer that is unread, you will have to also catch that when deserializing the block and ignore that data at the end after the last txn..).
I'm referring to the OP_8 in the output script (witness prog version), not the witness data of the inputs. This is what's making it a new address type, if that matters to electrum
Yeah as far as electrumx is concerned, for now, that would just be a new unique "address". It doesn't really inspect the scriptPubKeys of anything.. it just hashes the binary data in a scriptPubKey, verbatim, and based on that hash, it considers that an "address".
Turns out that the tx.mweb_tx
is always "null" there for a hogex, but it still eats a single byte. I think there is only a tx with a non-null tx.mweb_tx
section for the "peg-out" transactions that are contained to the MWEB itself after the main block (which can just be discarded), not in the regular block. Note the comment /* If the MWEB flag is set, but there are no MWEB txs, assume HogEx txn. */
, which does seem to be correct.
What I think that means in practice is that if bit 3 of flag is set, we can just gobble up a single byte to simulate s >> tx.mweb_tx;
, then move on to locktime. Then ignore the MWEB data that follows.
I can deserialize all transactions in testnet blocks from ~1000 blocks prior to mweb up to the chain tip (2214584 to 2349311) and I get all the correct tx ids. EDIT: and all mainnet txns from 2263985 to 2268685.
Well, this seems to work, although I have basically no idea what I'm doing in python or electrumx: https://github.com/spesmilo/electrumx/pull/180/commits/ee72f0bf90cca16785f48729bc248d3e82258962
EDIT: hmm, perhaps not if that is used for mempool txns which may have the mw data not stripped.
Yeah you're right.. I can't find a single tx since activation on either chain that has "real" mweb data in it, they all have the single 0 (which indicates "null" mweb for the last txn). So my theory that nLockTime was corrupted actually doesn't hold. But I do think ElectrumX is getting confused on txid's somehow (most likely)...
Anyway if that single-zero pattern happens, it's an indicator to the CBlock deserializer in the C++ code to continue and to deserialize the trailing data after the last txn as mweb extension data. And indeed all blocks since activation have this data. So yes, what you observed is correct, sir!
I'll have a look at your Python later. I'm still trying to get my C++ Fulcrum app to behave sanely.
The truly sad thing is for ElectrumX and Fulcrum -- even if you patch it -- until Electrum-LTC is patched, it's possible for clients to get served up funny data (in that last mweb txn), and they throw deserialization errors when that happens. (This is if they ever request the last mweb txn in the block).
For now, sadly, a workaround would still be to use -rpcserialversion=1
, as uncomfortably as that feels.. until Electrum-LTC is patched..
God, what chaos this MW stuff caused. :) I hope it's worth it!
I'm surprised the Electrum-LTC devs haven't released a client that can handle this. I'm also surprised that there is a MAJOR fork on testnet for LTC. It seems there are tons of un-upgraded nodes. The "un-upgraded" chain is LONGER than the "upgraded" mweb chain on LTC testnet!!
Sadly that simplification only applies to transactions in blocks, not mempool. While we can process all blocks like this, the assertion that the data is just a 0 fails for mempool transactions since they seem to include the ones that end up in the extension block. So doing it the hard way and decoding whatever this mw tx data is faithfully would seem to be the only option for mw txns in mempool.
So my theory that nLockTime was corrupted actually doesn't hold. But I do think ElectrumX is getting confused on txid's somehow (most likely)...
It was because with a flag of 8 (not 9 with the segwit bit also set) you're not supposed to check for input witness data despite it still being a segwit transaction as per the existence of the marker+flag. But the inference was based purely on the marker, so for txns with 2 inputs, it popped off 2 bytes thinking those were the sizes of the witness data stacks for the two inputs (they were zeros), but really this was eating into the locktime bytes, leaving locktime to read from the big MWEB block data that follows this last txn.
God, what chaos this MW stuff caused. :) I hope it's worth it!
It's certainly holding up my work. The testnet chaos is a real problem. I requested some action a few days ago, at which time I decided to see if I could get electrumx on 0.21.2... nope.
Sadly that simplification only applies to transactions in blocks, not mempool. While we can process all blocks like this, the assertion that the data is just a 0 fails for mempool transactions since they seem to include the ones that end up in the extension block. So doing it the hard way and decoding whatever this mw tx data is faithfully would seem to be the only option for mw txns in mempool.
Ruh-Roh. Does this mean that if I do the naive thing in Fulcrum and deserialize but "ignore" the mweb data -- that mempool txns will be referring to spends that don't really "Exist" (since they are spending coins from mweb txns?). HALP!
Or.. what? I don't get it. LOL.
Currently I hacked Fulcrum to just read-but-ignore the mweb data. It stores it in a byte blob. I had to hand-code the rules for deserializing mweb data correctly but I ignore it since it's not clear to me what to do with this data.
Can these mweb txns lead to "coins" (UTXOs) being created that are otherwise "invisible" to the main chain? Or does some hand-wavy thing happen where the coins all somehow magically appear in the (normal places) in that last HogEx/mweb txn in the block (so that unupgraded software sort of sees a sane picture of UTXO creation/destruction)? HALP! I am so confused and I don't have the time to read all of the sourcecode to understand this MW stuff .. any thoughts on this are appreciated!!
The testnet chaos is a real problem. I requested some action a few days ago, at which time I decided to see if I could get electrumx on 0.21.2... nope.
LOL. Hilarious. So weird.
? Or does some hand-wavy thing happen where the coins all somehow magically appear in the (normal places) in that last HogEx/mweb txn in the block (so that unupgraded software sort of sees a sane picture of UTXO creation/destruction)? HALP! I am so confused and I don't have the time to read all of the sourcecode to understand this MW stuff
Hah! I don't know either, but that's basically what I gather the integration transaction is doing - integrating coins from "peg-in" transactions in the regular blocks with outputs specified in the MW transactions in the EBs, using the "hogaddr" to aggregate the coins.
I'm gonna have to move on from this, but I hope it gets solved soon.
Update for those interested: I am ignoring the mweb-only txns appearing in mempool for now in Fulcrum in local branch. They have no canonical txins and no canonical txouts (they do their magical spends inside the mweb blob, I guess).
Everything appears peachy for now.. no non-existant/illogical UTXOs being spent or created ex-nihilo as far as the base non-mweb layer is concenred. So far, that seems to be the case. So maybe it does appear at first glance you may be right @chappjc that the integration txn at the end of the block when confirmed sort of "bridges the gap" between canoncial UTXOs and these magical mweb coins... I am guessing? LOL..
I am ignoring the mweb-only txns appearing in mempool for now in Fulcrum in local branch. They have no canonical txins and no canonical txouts (they do their magical spends inside the mweb blob, I guess).
The mempool txns with some data in the tx.mweb_tx
section prior to locktime
sometimes have no regular ins/outs like you noticed, but sometimes they do have regular inputs. I caught one of these failing deserialization: https://ltc.bitaps.com/e26094df094b9143cf5ac845e7a75a81cc5d440cd4aaa14372ac473626633179/input/0 (recognized as it was trying to spend that prevout). This would seem to be one of the "peg-in" txns (note that OP_9
output script) that do end up in the canonical block, only with the mweb_tx data apparently stripped (with some of this data including peg-in kernel and EB output going into the EB when mined).
So, if you queue on the absence of canonical inputs and outputs, you might still trip over these while they're in mempool.
but sometimes they do have regular inputs... So, if you queue on the absence of canonical inputs and outputs, you might still trip over these while they're in mempool.
Yes, I know.. Those I am keeping (and just ignoring the mweb blob mostly). I patched Fulcrum to deal with this (hopefully) resiliently. So yeah -- if canonical ins/outs -- keep. If not, safe to discard txn since electrum protocol can't "see" the pegins or whatever they are called anyway...
But yes thanks for pointing that out to anyone playing along at home...
For those using -rpcserialversion=1
it appears ElectrumX still crashes after some time.
The logs suggest that it's an issue with parsing transactions out of the mempool. The server then crashes on restart until that transaction clears the mempool.
Here's the relevant bit from the logs:
File "/electrumx/server/mempool.py", line 331, in deserialize_txs
tx, tx_size = deserializer(raw_tx).read_tx_and_vsize()
File "/electrumx/lib/tx.py", line 305, in read_tx_and_vsize
tx, _tx_hash, vsize = self._read_tx_parts()
File "/electrumx/lib/tx.py", line 282, in _read_tx_parts
inputs = self._read_inputs()
File "/electrumx/lib/tx.py", line 156, in _read_inputs
return [read_input() for i in range(self._read_varint())]
File "/electrumx/lib/tx.py", line 156, in <listcomp>
return [read_input() for i in range(self._read_varint())]
File "/electrumx/lib/tx.py", line 160, in _read_input
self._read_nbytes(32), # prev_hash
File "/electrumx/lib/tx.py", line 184, in _read_nbytes
assert self.binary_length >= end
AssertionError
I am using litecoind v0.21.2
and electrumx v1.16.0
.
Yes. You are right. The code that dumps mempool via RPC in litecoind v0.21.2 via the getrawmempool
call ignores the flag, and it sends out any mweb txn that may be in the mempool anyway, serialized as mweb.
So yeah.. there is no easy fix other than to modify electrumx deserializer to understand this new txn format modification.
Sort of. getrawmempool
returns txids and getrawtransaction
actually gets them. But I don't think it's that the rpcserialversion
flag is being ignored. I think when getrawtransaction
requests a pure MW txn (no canonical inputs or outputs), that txn simply not valid in the canonical block (notice the stack trace on _read_inputs
) even with the mw bits removed. Arguably getrawmempool
should omit such txns if the flag is specified, but it's debatable.
You're right though that it boils down to needing to actually teach electrumx the new format.
Oh right, yeah. Sorry. I misremembered. I JUST worked on this yesterday and fixed it in Fulcrum. Yes indeed getrawmempool
vomits out all txids, some of which may be "pure" mweb txns, The serialization flags are indeed respected in getrawtransaction
and to ElectrumX it would look like a txn with 0 inputs and 0 outputs.
I would argue that getrawmempool
definitely should be OMITTING these txid's from the list. The funny thing is these txns that have 0 canonical ins and 0 canonical outs have their txid calculated in a different way, using blake3, and looking at data inside their "Kernel" fields. So the fact that you are given a txn with a txid that is illogical to you, a humble client that speaks serialization version = 1, is a bug IMHO.
As to why electrumx is crashing: No idea. From my reading of the code it should not be crashing on these txns.
shrug
From the stack trace it appears to be crashing on a txn that has some segwit inputs in it actually.. so not a "pure" mweb txn.. but a heterogenous one... maybe.
Who knows. Meh.
I know it may be poor taste to advertise my software here -- but given that there may not be a rush for ElectrumX to add support for litecoin's new breaking changes anytime soon, I want to announce here to anybody desperate to get their litecoin wallet server working that Fulcrum, as of latest commit to master, supports latest litecoind v0.21.2. I plan on doing a release as well sometime soon (which will include pre-built binaries with LTC support baked-in) but for now one can compile Fulcrum from source .and get LTC support that doesn't fail on latest litecoind.
Fair enough, but as you've mentioned, the client is still an issue, and until Electrum-LTC is updated there's little point in running electrumx with litecoind 0.21.2 anyway.
I will note that with the quick hack I posted in here, electrumx won't fail with the new serializations, but for the MW mempool txns the hashes can be incorrect. No crashes however for any mempool txns, and block parsing works correctly.
I ran this hack for a couple days just to scrape test vectors for these mempool transactions, and rolled up complete deserialization code (my Go project) for any variety of transaction I've observed. For anyone interested in a blueprint and test case, perhaps for fixing up electrumx for LTC MW, see https://github.com/decred/dcrdex/pull/1536/commits/1c0e3aa7185abe123a96bb5f0a2d3c610ec3b283 There are a number of block and transaction test vectors there to validate any python solution, and it's annotated w.r.t. the litecoin C++ code.
One of the strangest things I noted was that peg-in transactions can be very different from mempool to when they are included in a block. See the two test cases for mainnet tx 8b8343978dbef95d54da796977e9a254565c0dc9ce54917d9111267547fcde03 in dex/networks/ltc/tx_test.go, one for the mempool serialization and one for when it's in a block. Almost everything is stripped when it gets included in the canonical block, and even the flag changes.
Then there are the pure (mweb-only) txns that only ever appear in mempool and then vanish into the MWEB block. Last is the HogEx, which is never in mempool, but like the peg-ins cannot be ignored because they can have standard outputs (example). Quite the melange of bizarreness.
here's little point in running electrumx with litecoind 0.21.2 anyway.
Huh? No. Why? MimbleWimble is opt-in. If you don't want to use it, you can ignore it completely. Pretend it doesn't exist. what's more, 99.99% of txns right now are not even mimble txns. So your assertion that there is "Little point" is incorrect.
If somehow you fear receiving a mixed txn that has both mimble data and regular utxo data just run litecoind with -rpcserialversion=1
and Fulcrum as the server...
MimbleWimble is opt-in. If you don't want to use it, you can ignore it completely. Pretend it doesn't exist.
That's the effect of continuing to run 0.18. :)
If somehow you fear receiving a mixed txn that has both mimble data and regular utxo data just run litecoind with -rpcserialversion=1 and Fulcrum as the server...
Well because of the "bug" you've pointed out of getrawmempool returning these MW txns with -rpcserialversion=1
...
That's the effect of continuing to run 0.18. :)
But some people don't just use litecoind with an Electrum server.. they do more with it. Those people now have an option.
Well because of the "bug" you've pointed out of getrawmempool returning these MW txns with
-rpcserialversion=1
...
Huh? I'm talking about Fulcrum working 100% correctly. This is handled correctly by Fulcrum. Those txns are filtered correctly and never presented to clients (since those pure-mweb txns are useless to clients anyway).
All good! I'm not saying don't use Fulcrum. Just pointing out that it sucks that Electrum-LTC isn't ready for the new serialization.
Out of curiousity, what does litecoind give when using -rpcserialversion=1
and you request a hogex? Does it remove the mweb_tx placeholder?
Also does it modify the flag? Can Electrum-LTC even make sense of the hogex (which can have standard output addresses)? I really have no idea.
Out of curiousity, what does litecoind give when using -rpcserialversion=1 and you request a hogex? Does it remove the mweb_tx placeholder?
Yes, correct.
Well there are two cases:
- If the txn has regular CTxIns and CTxOuts you get those, of course. And TxId is calculated using the normal mechanism.
- If the txn has empty CTxin and empty CTxOut, the mweb tx blob area is not there (because the serialization version doesn't permit it).. BUT here's the broken & odd thing: the txid for tx's in this case was calculated using stuff from inside the mweb tx! So as a client you get this empty tx that has nothing useful in it, with a txid that appears "Wrong" to you. So bizarre. Only happens in this case though. If there are some regular CTxIns and Outs this doesn't happen and txid is calculated without the mweb data taken into account.
In case (2) Fulcrum filters them out. Until clients start to exist that can parse mweb you just have to discard those txns completely.
Also does it modify the flag?
Yes. Flag has just segwit (0x1).
Can Electrum-LTC even make sense of the hogex (which can have standard output addresses)?
Yes. The txn is serialized normally and is just a normal tx without the mweb stuff. No flag, nothing. It's stripped of all that data so yes, Electrum-LTC can read it just fine.