Consider removing parent block hash and parent coinbase Merkle index from AuxPoW serialization
The serialization of AuxPoW headers contains two fields that are unnecessary: the hashBlock and nIndex members of the coinbaseTx field. hashBlock contains the hash of the parent block; this is redundant since its preimage is also present (as the parentBlock field). nIndex is redundant since the coinbase transaction always has a Merkle index of 0.
Currently, hashBlock isn't processed in block validation in any way (and apparently the Namecoin blockchain contains some blocks where it's big-endian and other where it's little-endian). My understanding is that nIndex is checked to verify that it's 0, but besides that it's not used anywhere in validation either.
Removing those fields from serialization would save some space (36 bytes per header, or around 145 kB per Electrum-NMC chunk, which corresponds to saving around 4.3% of the download time for Electrum-NMC to sync given that the most recent 26 chunks are 87.3 MB). Note that this savings applies to SPV clients, not just full nodes.
For the RPC API that returns block headers, an optional argument could be added which removes those fields from the serialization; this argument could later be set to enabled by default, and then subsequently remove the ability to disable it. For the P2P wire protocol, a new protocol version could be added which removes those fields. I'm not certain what the best approach would be for dealing with on-disk storage.
For the existing RPC mode and wire protocol version where those fields are included, it would be useful for newer clients to overwrite hashBlock with zeros; this shouldn't have any negative effects (since it's not checked by any existing software), and replacing it with zeros will make the data significantly more compressible. (Electrum's protocol is planning to start compressing headers with DEFLATE soon, and I wouldn't be surprised if mining relay infrastructure is also using compression.)
Good catch, this is indeed something we can optimise. The plan sounds solid (as already discussed offline). These redundant fields are presumably a legacy of the original implementation and using CMerkleTx as basis for CAuxPow to simplify the code.
For the RPC API that returns block headers, an optional argument could be added which removes those fields from the serialization; this argument could later be set to enabled by default, and then subsequently remove the ability to disable it.
This sounds like a good first step. Is that something that benefits Electrum directly (e.g. because the Electrum server calls those RPCs)? I think the cleanest solution for turning it on would be adding a new CLI argument / configuration option. That way, we do not have to mess with the RPC interface itself. That option could be turned off initially (so the feature is opt-in), then turned on by default in the next release, and finally removed completely.
For the P2P wire protocol, a new protocol version could be added which removes those fields. I'm not certain what the best approach would be for dealing with on-disk storage.
For the on-disk blocks, we could introduce some cut-off time (in the future): Blocks with a timestamp before that time are written and read in the current format, while blocks after that time are written and expected to be in the new format. That way, the existing data on-disk can be kept just fine for everyone who updates to the new version before the cut-off. Everyone who updates later would need to re-download the chain.
However, I'm not really convinced this is worth either the extra code complication and the reduced space saving (since old blocks wouldn't benefit). It seems fine to require resyncing the chain when updating - that's quite fast with Namecoin anyway.
For the existing RPC mode and wire protocol version where those fields are included, it would be useful for newer clients to overwrite hashBlock with zeros.
That's definitely something we can do, yes. When we do that, however, I also want to refactor the code a bit to make it more explicit that CAuxPow is not a valid Merkle tx.
All in all, thanks for suggesting this optimisation - I'll work on it. I'm not sure if we should include that already in 0.18 (which is already branched), but definitely for 0.19.
Is that something that benefits Electrum directly (e.g. because the Electrum server calls those RPCs)?
According to the ElectrumX source code, it looks like ElectrumX is using the getblock RPC method to talk to Bitcoin Core (and forks of Bitcoin Core such as Namecoin Core).
I think the cleanest solution for turning it on would be adding a new CLI argument / configuration option. That way, we do not have to mess with the RPC interface itself.
Initially when I filed this issue I assumed that the relevant methods would have an options argument that we could add a field to, but upon checking more closely it looks like this is not the case. So yeah, adding a new ordered argument to the RPC interface is probably a bad idea since it might conflict with upstream Bitcoin Core in the future. Concept ACK on making it a namecoin.conf option.
That option could be turned off initially (so the feature is opt-in), then turned on by default in the next release, and finally removed completely.
Concept ACK, although we might want to wait a while before removing the option completely to make sure that everyone who's depending on the old behavior has updated their client software.
For the on-disk blocks, we could introduce some cut-off time (in the future): Blocks with a timestamp before that time are written and read in the current format, while blocks after that time are written and expected to be in the new format. That way, the existing data on-disk can be kept just fine for everyone who updates to the new version before the cut-off. Everyone who updates later would need to re-download the chain.
However, I'm not really convinced this is worth either the extra code complication and the reduced space saving (since old blocks wouldn't benefit). It seems fine to require resyncing the chain when updating - that's quite fast with Namecoin anyway.
Concept ACK on requiring resyncing the chain when updating.
All in all, thanks for suggesting this optimisation - I'll work on it. I'm not sure if we should include that already in 0.18 (which is already branched), but definitely for 0.19.
I'd lean toward keeping it in 0.19 and higher; making changes like this after 0.18 has branched is probably too risky given that the 0.18 branch is intended to stay stable.
As an aside, we may want to reach out to the developers of other AuxPoW chains (e.g. Dogecoin) to check whether they've already considered this kind of change, and/or whether this kind of change would cause any issues for them. @domob1812 if you happen to have contacts at those chains (I seem to recall that you have contacts via MeMiCA?), feel free to contact them (and mention in this thread which chains you've contacted); otherwise I can look into reaching out to them.
@domob1812 I'm looking at the code more closely, and I suspect there are a couple of other fields that we can remove from the serialization. I think the length field of the chain Merkle branch can be inferred from the coinbase transaction (it's present in the AuxPoW coinbase commitment), and the chain Merkle index can be inferred from the coinbase transaction, chain ID, and chain Merkle length. Is that accurate?
@JeremyRand: Yes, I think that is correct. However, the main saving of space is still the block hash - since that is 32 bytes, compared to 12 bytes for all the integer fields (nIndex plus the two you noted above) combined.
Also, once we just change the block hash to all zeros (which can be easily done without the need to implement new formats / versions and is thus my next step), at least with compression most savings should already be utilised.
@JeremyRand: Yes, I think that is correct. However, the main saving of space is still the block hash - since that is 32 bytes, compared to 12 bytes for all the integer fields (nIndex plus the two you noted above) combined.
Also, once we just change the block hash to all zeros (which can be easily done without the need to implement new formats / versions and is thus my next step), at least with compression most savings should already be utilised.
@domob1812 Agreed that the parent block hash is the biggest savings, my point was just that if we're going to break compatibility in a new protocol version, we might as well remove those other two integers at the same time.
Yes indeed. When changing the format in the step after zeroing the hash, we should remove those as well. Thanks for pointing that out!
@domob1812 I'm looking at the code more closely, and I suspect there are a couple of other fields that we can remove from the serialization. I think the length field of the chain Merkle branch can be inferred from the coinbase transaction (it's present in the AuxPoW coinbase commitment), and the chain Merkle index can be inferred from the coinbase transaction, chain ID, and chain Merkle length. Is that accurate?
While working on #294, I thought about these a bit more. While they are in theory redundant, it would be tricky to remove those fields from the serialised data. The reason is this: These values are needed to compute the merged-mining Merkle root. But that root hash is needed to actually find the position in the parent coinbase where these values are stored, at least in the case where no merged-mining header marker is present.
Of course, one could try to work around that by trying out different positions / guessing where the data likely is, but that is very messy and error prone for, IMHO, little saving. Thus I think we should just remove the hashBlock and nIndex fields, while keeping the branch length and nChainIndex fields.
(In addition to this difficulty, to get rid of the branch lenght, we would have to write our own serialisation logic not reusing the one for vectors already present in Bitcoin Core. That is certainly possible, but would also lead to lots of complications for little savings (one byte).)
@domob1812 Interesting, thanks for pointing out the issue with computing the Merkle root. Agreed that removing the chain Merkle length and chain Merkle index is probably more trouble than it's worth. That said, I'd like to revisit removing those when we do AAA, since I'm guessing it's possible to design an AuxPoW format that allows removing the redundancy if we allow ourselves to do a hardfork.
@JeremyRand: I think it strongly depends on how exactly we do the redesign. I agree that thinking about a more efficient (in general) auxpow structure could make sense. If just the auxpow format itself changes but not how it is anchored in the parent block, then I think we can even do it without a scheduled hardfork (as described in #294). But if it involves changing the parent block commitment, then I think it is unlikely we can ever change it, due to miners also including other merged-mining coins and the like.
But feel free to think about an improved structure, and we can then think about the details once we have the plan for it.
if it involves changing the parent block commitment, then I think it is unlikely we can ever change it, due to miners also including other merged-mining coins and the like.
There's a strong argument for changing the parent block commitment to use an output in the coinbase transaction rather than the coinbase input, and to always use the last output that has such a commitment (rather than the status quo where a coinbase transaction is invalid for Namecoin if it has multiple commitments). This is because mining improvements such as Matt Corallo's decentralized pool protocol require such a structure in order to interoperate optimally with AuxPoW.
This means that miners would need to have 2 commitments: one for newer output-based AuxPoW, and one for the older input-based AuxPoW. It wouldn't hurt the other chains that don't upgrade, since the two commitments can co-exist.
Pretty sure the closing of this issue was due to a Bitcoin/Namecoin issue number collision; re-opening.