zips icon indicating copy to clipboard operation
zips copied to clipboard

[ZIP ???] Multipart Memos

Open str4d opened this issue 5 years ago • 8 comments

Depends on #105.

str4d avatar Jun 27 '19 14:06 str4d

@str4d this is an interesting ZIP in relation to Hush, who is exclusively focused on the memo field and using it to the full potential. It's expected and inevitable that Hush will diverge from how Zcash will use the memo field, but I hope the following constructive criticism is useful to you in relation to how Zcash will use memos.

You say It is possible to send multiple outputs to the same recipient address within a single transaction but that is a contentious feature that has been decided against by Daira in the past. That decision needs to be undone to move forward with this ZIP. Hush allows "duplicate receivers" as I call them.

You say total length of an encoded multipart memo is restricted to 64kiB, but will that be a consensus rule? I have created 1MB "multipart" memos on Zcash mainnet, it's possible right now. This means that this ZIP requires new consensus rules, which is not clear from the ZIP.

You say the maximum number of shielded outputs that a transaction could contain while still fitting into a block is around 2100 but that is only theoretical, perhaps achievable on testnet. Real Zcash mining pools do not allow a transaction to fill a block. From my research, all the big pools limit a transaction to half a block. They do this via custom patches, which is dangerous, because it could lead to a weird chain fork. I describe this in more detail in the Sapling Woodchipper CVE (which has still not been addressed): https://saplingwoodchipper.github.io/

Additionally, why do you not have a checksum for the full data that will be reconstructed by the multi-part memo? HushList protocol uses checksums/hashes of the full data to make sure that no data has been corrupted.

Lastly, the word "dummy" has negative connotations and might be misunderstood by non-native English speakers. A transaction with amount=0 is just as valid as any other transaction, and it has full use of the 512 byte memo field. I don't think referring to them as "dummy" outputs is useful nomenclature. Something like "amount=0 outputs" or "zero value outputs" is more clear.

leto avatar Jul 25 '19 15:07 leto

It's expected and inevitable that Hush will diverge from how Zcash will use the memo field

That may be the case, but there is still benefit in reducing the complexity for implementations handling any divergence. Does Hush use ZIP 302 or something similar for handling its binary encodings?

You say It is possible to send multiple outputs to the same recipient address within a single transaction but that is a contentious feature that has been decided against by Daira in the past. That decision needs to be undone to move forward with this ZIP. Hush allows "duplicate receivers" as I call them.

At a consensus level, there's nothing we can enforce here (by design), so I agree it is important to be clear about the defaults. My own position is that a transaction should not contain more than one value-carrying output per recipient address, but that additional zero-value outputs are fine (as they can be easily pruned by nodes so there is no ongoing witnessing cost). I would appreciate @daira's feedback on this ZIP draft.

You say total length of an encoded multipart memo is restricted to 64kiB, but will that be a consensus rule? I have created 1MB "multipart" memos on Zcash mainnet, it's possible right now. This means that this ZIP requires new consensus rules, which is not clear from the ZIP.

Yep, I'm fully aware that 1MiB worth of memo to a single recipient can be created in current transactions, as I alluded to in the rationale:

The total length of an encoded multipart memo is restricted to 64kiB in order to bound the privacy leakage associated with generating transactions containing many shielded outputs. A 64kiB multipart memo would require 130 shielded outputs, whereas the maximum number of shielded outputs that a transaction could contain while still fitting into a block is around 2100.

The privacy leakage rationale above is based on the hypothesis that a Schelling point around multipart memo usage in the low kiB range means that such transactions are less likely to stand out compared to e.g. shielded mining pool payouts or batched exchange payouts. As for the suggestion of a 64kiB limit, it's a reasonable MTU to work within and it fits nicely in a 2-byte length field. A sender following this ZIP could send full-length multipart memos to at most 16 recipients within a single transaction; again this seems like a reasonable Schelling point. It would be very useful to have input from potential users of this specification, to see where their use-cases would sit.

There's no need IMHO for this ZIP to specify any change to the consensus rules, as the contents of the memo fields are inherently not part of the core consensus rules (only standard rules). Any proposals to alter the consensus rules regarding the maximum number of shielded outputs would be better served by a separate ZIP.

The 64kiB length limit is however a strong standard rule, in that anyone working with this ZIP's format cannot create compatible longer multipart memos, because of the fixed-width 2-byte length field in the header. Lower limits could potentially be surpassed by implementations that don't place appropriate bounds checks on the values within the length fields.

You say the maximum number of shielded outputs that a transaction could contain while still fitting into a block is around 2100 but that is only theoretical, perhaps achievable on testnet.

The policies that individual mining pools choose to enforce have no bearing on the validity of that statement, which is derived from the consensus rules.

Additionally, why do you not have a checksum for the full data that will be reconstructed by the multi-part memo? HushList protocol uses checksums/hashes of the full data to make sure that no data has been corrupted.

Checksums are used to detect errors during transmission or storage of data. Zcash uses AEAD ciphertexts which natively provide this functionality, and additionally all ciphertext fields are covered by the transaction signatures. This makes corruption of the encrypted memos in-flight from sender to recipient impossible without invalidating the entire transaction. The only other source of corruption would be bugs in the implementation of the sender or recipient, and I'm not (yet) convinced that a checksum baked into the protocol (which would also be subject to implementation bugs) is more beneficial than e.g. test vectors. But it's a relatively small overhead, so I wouldn't be opposed to including one.

I don't think referring to them as "dummy" outputs is useful nomenclature.

I refer to them as "dummy outputs" in the ZIP because that is the nomenclature used in the protocol specification. If the specification terminology changes, I will update this ZIP to match. Please file an issue for tracking this.

str4d avatar Aug 03 '19 18:08 str4d

@str4d thank you, I really appreciate your detailed feedback.

In an attack scenario, where an attacker sends a 1MB memo data with "faked" 2 byte length field, what are wallets supposed to do? Without consensus rules to protect against this situation, many weird bugs could potentially be tickled.

Does Hush use ZIP 302 or something similar for handling its binary encodings? No, Hush does not plan to use ZIP 302. Since we will allow storing more data, a 2 byte length field is not sufficient and we plan to store various other kinds of metadata as well.

leto avatar Aug 04 '19 16:08 leto

In an attack scenario, where an attacker sends a 1MB memo data with "faked" 2 byte length field, what are wallets supposed to do? Without consensus rules to protect against this situation, many weird bugs could potentially be tickled.

I can't quite tell if this is what you mean by "faked" length field, but I assume you are asking what happens if the length field is set to a shorter value than the actual amount of multipart data that the adversary is trying to convey.

Getting the trivial case out of the way: A well-written recipient implementation would reject this, as the length of the reconstructed data does not match the length field. But let's assume the recipient has a bug and is not enforcing this.

In the ZIP draft as-written, the non-header multipart chunks are identified by memo part number, which is a fixed 1-byte integer. This means that at most 255 multipart chunks could be used in a multipart memo (recalling that the header itself is identified by memo part number zero), for a total maximum reconstructed length of around 126kiB. This is just under double the maximum specified in the ZIP, and 8x smaller than the 1MiB maximum data that could be fitted into shielded outputs.

If an adversary attempted to actually send 1MiB of data to the same recipient address using the multipart format in this ZIP, they would have on average 8 chunks with the same memo part number. Recipient implementations would reject the entire multipart memo in this case (under the no-duplicate-part-numbers rule); maybe they should also reject the entire transaction?

If the implementation had another bug where it was not enforcing unique memo part numbers, then the most likely result is that later-parsed chunks would overwrite earlier-parsed chunks (if pending chunks were stored in slots), or subsequent duplicate chunks would simply be ignored (if the recipient implemented a scanning reconstruction). Whether or not this introduces an exploitable bug (due to potentially-differing interpretations of the memo by correct and buggy recipients) would then depend on how the reconstructed multipart memo is used by the recipient, and thus a different-layer concern (for the ZIP defining the protocol inside the multipart memo).

This ZIP already points out that invalid memos MUST NOT be acted on in the Security and Privacy Considerations, but I'll have a think about how I can make it clearer that ZIPs defining a multipart memo type need to take into account the possibility of bugs in the recipient implementation of this ZIP. Note that a checksum in this ZIP's format would not help at all here, because in this thought experiment the entire transaction is adversarial, so the adversary can ensure that the checksum passes for their target recipient (even if it wouldn't necessarily pass on all recipient implementations). However, checksums in the higher-layer protocol may indeed make sense, where this multipart memo format is itself the transport layer.

Note also that all of the above discussion applies to adversarial multipart memos that are smaller than 64kiB, which can just as easily be created with duplicate part numbers. So a consensus rule affecting the number of shielded outputs doesn't prevent recipient implementation bugs from being a potential problem.

No, Hush does not plan to use ZIP 302. Since we will allow storing more data, a 2 byte length field is not sufficient and we plan to store various other kinds of metadata as well.

The ZIP in this PR is not ZIP 302 (it hasn't had a number allocated by the editors yet). ZIP 302 (PR) standardises how to parse a single memo field when it contains non-UTF-8 data. I'm unaware of any deployed applications using it yet within the Zcash ecosystem; if there were any, I'd have expected to see a ZIP requesting to reserve a type byte, like this ZIP does.

str4d avatar Aug 04 '19 22:08 str4d

@leto wrote:

You say It is possible to send multiple outputs to the same recipient address within a single transaction but that is a contentious feature that has been decided against by Daira in the past.

I don't remember having decided that. As @str4d says, it's clearly not enforceable at the protocol level.

daira avatar Feb 13 '20 13:02 daira

@daira requested that this proposal instead be combined with ZIP 302, and that we use ZIP 302 as the canonical memo field specification with future updates as necessary. Once #105 is merged I'll rebase this PR to instead update ZIP 302.

str4d avatar Feb 13 '20 14:02 str4d

Zecwallet Lite is implementing multi part memos, but with a simple text-only scheme. It splits the memos as (1/3) part1, (2/3) part2, (3/3) part 3 etc...

The text-only encoding is motivated in part by backwards compatibility with zecwallet lite itself and other wallets that haven't implemented this zip.

I'm happy to add support for this ZIP as well, if we can agree with the format (The current format looks good to me)

adityapk00 avatar Apr 27 '20 21:04 adityapk00

Standardizing around this proposal is a great idea. In addition to responding to @dconnolly's great suggestions, it seems like the next step may be to move the underlying proposal forward? ZIP 302

Are there any blockers on these two zips?

gmale avatar Jan 28 '21 22:01 gmale

This proposal will probably be subsumed by a proposal to decouple memos from outputs in NU6.

nuttycom avatar Apr 09 '24 17:04 nuttycom

Closing because we're going to pursue #627 instead.

str4d avatar Apr 23 '24 18:04 str4d