New binary encoding/decoding algorithms
Abstract
With the expansion of cross-chain protocols such as Axelar, Wormhole, or LayerZero, we see the emergence of cross-chain dApps. We expect the contracts of those dApps to communicate across blockchains. However, communicating means adopting a common serialization/deserialization format.
The issue
Sui's smart contracts use BCS, but this binary format is not recognized outside of Move blockchains. The reverse is also true: the Sui Move Framework doesn't provide serialization/deserialization functions for other formats.
The implementation of new binary formats in Move is possible but faces several limitations:
sui::vectordoesn't contain some functions such as reading vector slices, making it harder and less efficient to read chunks of bytes.- Metaprogramming on structs is not possible. Move's macros are similar to Rust's declarative macros. It is impossible to perform operations on a type's fields at compile time.
- No trait system exists.
Given these limitations, the current solution to implement a new encoding format is to create helpers to encode/decode primitive Move types: u8, u64, vector<u8>, etc., and let the developer manually create encoding and decoding logic for their defined structs. This is not only time-consuming, there are security risks and optimization issues.
Solutions
There are multiple solutions that I have in mind:
- Integrating the most popular formats into the Sui Move Framework. We could have a
sui::abimodule working the same way assui::bcs. This has the huge advantage of allowing serialization/deserialization to be written in the Rust runtime, but the drawback of adding niche functions to the Sui Move Framework for each binary format. - Creating a system similar to Serde, built inside the Sui Rust runtime. This could be done by adding native Move functions in the Sui Move Framework that return a layout for a
TypeName(in the case of decoding bytes) or a value (in the case of encoding), or anything that would allow the building of generic encoders/decoders in Move. - Adding procedural macros to Move that can be attached to structs. We could imagine having annotations such as
#[derive(ABIEncodable(encoder_package = ...), ABIDecodable(decoder_package = ...))]. Those macros would generate Move code using helpers functions I mentioned above. However, this may require a lot of effort to build.
I would love to hear about other ideas and solutions!
Thank you for opening this issue, a team member will review it shortly. Until then, please do not interact with any users that claim to be from Sui support and do not click on any links!
Hey @big14way, to give you more context: before creating this issue I wanted to create a PR for the first solution. As explained implementing ABI directly in the Sui's Framework as a sui::abi module is a very niche feature that might not be desired by the Sui's team.
I asked Sui's team before doing anything if they are willing to accept this kind of features, or rather keep Sui's Framework more general with functions used by most of the Move developers. I got confirmation that they prefer keeping Sui's Framework more general.
I guess before implementing anything, we should debate and design it, especially given that any of the solutions I proposed might need sensitive functions to be created and thus audited.
If you have any feedback regarding the solutions, or even have another ones in mind I'd love to discuss them!
(Note: I'm not part of the Sui team)
Hi @gfusee, thanks for sharing this issue. Could you give some insight into specific encoding formats that you would want to see supported and the difficulties you've had in writing encoders and decoders for them in Move? As much detail as possible would be helpful (not just the format, but the chain you're communicating with, if that communication is over a bridge, then the limitations of that bridge, etc). That will help us fully explore the solution space -- maybe there are some limited set of primitives that we can expose to make it easier to implement these formats.
To add some context on the other side, one thing that makes simplifying serialization/deserialization tricky is that it can interfere with Move's strong encapsulation -- if I can take a binary payload from another chain and deserialize it as some type T, I may be able to bypass invariants involved in creating Ts. We can address this by making it so that only the module that defines T can create a T in this way, but then deserializing complex compound types becomes tricky. These kinds of constraints are what shaped the existing BCS deserialization support.
Hi @amnn! I discovered this Move limitation while working on integrating Axelar to make a Sui package communicate with EVM smart contracts. Communication is done through message passing as raw bytes; therefore, it should have a common encoding.
It is unlikely that EVM ABI contracts will adapt to Sui's BCS encoding. A Sui package adapting to EVM smart contracts is the more probable possibility.
The difficulty in implementing ABI encoders and decoders is that:
- it is impossible to do this in a generic way for a type T. Even with an ABI encoder/decoder, a developer still has to implement encoding/decoding manually for each type
- Move functions manipulate vector
, a.k.a. bytes. For example, there is no way to read a slice. ABI is not a binary format where fields are simply put one after the other
Note that those limitations don't make impossible to encode/decode complex types, but it requires to be super careful to avoid mistake while writing the logic
In my opinion, this issue highlights a more general limitation of Move: its lack of features for manipulating generics. There are several possible solutions to improve this: meta-programming, having a way to get a struct's layout, more helpers and native functions, and so on.
Regarding encapsulation, I believe we can address this. As you said, it is possible to restrict a type T from being deserialized outside its declaring module, with a recursive check to ensure nested types are also declared in the module. I don't see use case of a developer wanting to decode another third-party module's structs, if one want to decode such data, they can just declare a struct with the same layout in their own module
I agree, expecting everyone to adopt BCS is unrealistic -- could you share what the binary format is that you are interested in though? Is it RLP? If we have something concrete to work with, then we can think about how to address that specifically.
It is ABI! I managed to made everything working for my use case, but the experience was not great. It wasn't hard since I already implemented binary encoders on other projects, but I guess for developers with no low-level knowledge it will be a real pain.
I opened this issue because I think there are rooms for improvements and wanted to open a discussion on this 🙂 I'm 100% sure other developers might have similar use cases, with ABI, other binary formats or on general bytes handling. Especially with the cross-chain interoperability quick expansion!
Understood, and TIL about the details of the ABI format. It is indeed a bit complicated, with the variable length parts moved out of band. Let me raise it in discussion with the team next week.
Thanks for raising this issue @gfusee, we were able to discuss this topic yesterday. The conclusion was that currently support for binary formats was best handled at the library level (similar to the support provided for BCS), and that initially, this could be handled by user libraries, rather than the framework.
The main motivation for this decision is that tackling this problem at the language level requires quite a few additional features targeted to this task (not very general). This is something that may change with time as we see increased demand for binary formats that need special support, or the addition of other general features makes this more tractable -- we'll keep this in mind.
One observation was that there are some existing implementations of ABI encoders/decoders, for example this one from Axelar. If we make it easier to discover existing packages on-chain and give people signals to judge whether they want to take a dependency on that code or not, that might help amortize the cost of supporting each new binary format.