celestia-app
celestia-app copied to clipboard
Adjust share size to match the spec and add namespaces to each Message
Namespaces are now kept in the actual message to ensure that they are recoverable, see the spec PR. This should also be accounted for in the app, specifically SHARE_SIZE - NAMESPACE_ID_BYTES - SHARE_RESERVED_BYTES
To clarify:
- the share size that is erasure coded is still
SHARE_SIZE
-
SHARE_SIZE - NAMESPACE_ID_BYTES - SHARE_RESERVED_BYTES
is how requests with a reserved namespace ID are chunked (txs, intermediate state roots, evidence) -
SHARE_SIZE - NAMESPACE_ID_BYTES
is how messages are chunked
Yeah, that should definitely be clarified.
When generating the commitments for SignedTransactionDataPayForMessage
, namespaces are not added to each share after chunking the message. This is what specifically needs to be fixed.
The (WIP) ll-core PR #235 is using a const AdjustedMessageSize, and I think the app should import this.
There is also something else to consider here: the spec currently assumes SHARE_RESERVED_BYTES to be one byte. That almost works because it is an index pointing to some starting point in a 256 bytes long slice. "almost" because if the index is > 127, its encoding currently will actually be 2 bytes long: https://play.golang.org/p/C_wN1E-lKRY
That is bc varint encoding encodes 7 bits at a time. See https://golang.org/pkg/encoding/binary/ for details on varint encoding. The implementation uses this encoding for unsigned ints (lengths/indexes), as almost all protobuf implementations will support this: https://developers.google.com/protocol-buffers/docs/techniques#streaming
Also, note that independent of the above issue, the current implementation does not use this index but rather length prefixed all messages (see https://github.com/lazyledger/lazyledger-core/issues/234)
So...in my head canon I assumed it would just be 1 byte, but you're correct that canonical serialization is prescribed
(the
*
in the example layout figure below) is the starting byte of the length of the canonically serialized first request that starts in the share, or0
if there is none, as a canonically serialized big-endian unsigned integer
Now we have a problem: if we use canonical protobuf, this could be 1 or 2 bytes.
- We can't always use 2 bytes because that's invalid canonical protobuf.
- We don't really want to use a variable number of bytes because it makes the chunking logic more complex...but we could.
- We could just not use canonical protobuf and instead just use the
uint8
byte directly.
We could just not use canonical protobuf and instead just use the uint8 byte directly.
I think this will be our best choice. It would work for the index that points to the start of a message. So it could always fit in one byte. IMO, the other thing, the length prefixing, should stay varint encoded. Simply because we don not really want to cap how long messages can get, right?
Yes, the length can still be varint encoded because it doesn't affect chunking logic (there's only a single length).
Can this be closed?
I'm not sure, I asked about this during the off-site and didn't get a conclusive answer on whether it was implemented. cc @evan-forbes
Sorry, I should have been clearer. The commitments that are generated do include the namespace of the message, but they do not add the length delimiter.
We also need more testing to compare the generated commitments with the subtree roots of an actual square, to make sure that we are doing this important step correctly in the future. This would also help test #49
https://github.com/celestiaorg/celestia-app/blob/1fcccc414127b1dd8399ad35051c8f669a98aa56/x/payment/types/payformessage.go#L140-L147
Was thinking about his more, and while we could use code from core that already exists (done on the evan/user-core-code-to-split-shares
branch)
https://github.com/celestiaorg/celestia-app/blob/ad49814f62879319d60d722beb237bbdc21ba9fa/x/payment/types/payformessage.go#L111-L127
we could also move the commitment creation code to celestia-core, that way it would easier to roundtrip test
I don't think we need this any longer, and am unsure of the context