pallas icon indicating copy to clipboard operation
pallas copied to clipboard

Pallas Evolutions

Open KtorZ opened this issue 10 months ago • 10 comments

Hello! It's been a while now that I've been using Pallas for a variety of projects and got a good grasp of it. I'd like to propose a few changes (which I am volunteering to implement as well, but would like to open a discussion first). Some of them are potential breaking changes, though they feel necessary to me to bring Pallas to the next level.

[!NOTE] I am making a single issue now to discuss those points, but I am open to splitting it into different issues if that makes it easier.

Owned memoization ?

pallas-primitives comes with various Minted wrappers used to keep a reference to original bytes from which values are deserialized. For example:

  • https://github.com/txpipe/pallas/blob/adab71fc90499258732f321e79c27b7392075ccf/pallas-primitives/src/babbage/model.rs#L265

  • https://github.com/txpipe/pallas/blob/adab71fc90499258732f321e79c27b7392075ccf/pallas-primitives/src/babbage/model.rs#L362-L363

  • https://github.com/txpipe/pallas/blob/adab71fc90499258732f321e79c27b7392075ccf/pallas-primitives/src/babbage/model.rs#L716-L720

Fundamentally, those are mostly generic structures that leverage the KeepRaw type:

https://github.com/txpipe/pallas/blob/adab71fc90499258732f321e79c27b7392075ccf/pallas-codec/src/utils.rs#L1107-L1111

The main downside of this structure is that it comes with a lifetime and doesn't own the bytes. This has proven challenging in numerous places when drilling through the structure and passing data around. However, it does solve an important problem: it avoids re-serialisation altogether, which can be quite error-prone in the context of Cardano since the CBOR encoders/decoders are non-canonical and because the slightest byte divergence will invalidate all hashes and signatures based on the original data.

Given that it's also a prevalent type likely used by many application downstream, simply removing them is probably not a good idea. One immediate naive suggestion would be to introduce a new Memoize type similar to KeepRaw, but that takes ownership of the bytes. Although, the problem with such type is that it can lead to a quadratic memory size growth (since the size of each sub-element and their children will now be stored twice). I am not sure it's much of a big deal though, because we do not have many of such types needing to own their serialization bytes.

In addition, for many of them, we can also store hashes and sizes instead of bytes since it is often what we actually want (e.g. for blocks or transactions). So there are perhaps subtle variations of KeepRaw that would be more practical than keeping full bytes.

De-couple Rust from CDDL/CBOR ?

pallas-primitives currently keeps a Rust representation of all types that is mimicking the on-the-wire format. In a sense, pallas-primitives is Rust eDSL to writing CBOR specific to Cardano.

I understand very much that pallas-traverse is meant to be used as an API layer on top of pallas-primitives that ensures many of the annoying CBOR-specifics are hidden away from users. Yet, pallas-traverse forces one to think in terms of multi-era objects, whereas in many occasions, one truly only care about a single era.

So I see two complementary options here:

  1. Have pallas-primitives come with some extra quality-of-life additions, for example, more dogmatic iterators on all iterable structures or, accessors for most common fields (e.g. pulling out Ada value out of an output). Of course, I totally get the desire to keep pallas-primitive as "raw" and as close to the wire format as possible. So it boils down to finding the right balance, and months of usage has convinced me that it isn't there -- so I am willing to have a go at it.

  2. Introduce perhaps a new pallas-latest crate that provides this layer on top of pallas-primitives, with a focus on a single era (the latest) and with more a opinionated interface for Pallas' internals.

I see both as complementary, and designing both at the same time can perhaps help iterating more quickly, as we can move out of pallas-primitives into pallas-latest was is deemed too high level for pallas-primitives

Full-deserialisation

Some data-structure in pallas-primitives do not deserialize their internals fully. This is at least the case for types holding addresses and reward accounts. Those are deserialised to Bytes instead of structured objects, which has three main issues:

  1. It prevents an early failing in case where one these nested type is in fact ill-formed. One needs a second round of deserialisation to fully validate a deserialised object.

  2. It comes with a performance hit since we need to traverse the structures an additional time to deserialise those leftovers.

  3. It's overall unpractical, since those elements are not generic we can't simply replace them once parsed in the original object (e.g. a transaction output). And so, every access requires to re-deserialise (and potentially fail) the said fields.

So my proposal, is to ensure that those types are properly deserialized in full, such that a successful deserialisation yields a complete type.

Stronger type safety

pallas-codec introduces many helpers to capture invariants from deserialised objects, which is good. What is more problematic is that many of those invariants can actually be bypassed. For example, a NonEmptyKeyValuePairs doesn't enforce non-emptiness whatsoever, and provide public access to its constructors:

https://github.com/txpipe/pallas/blob/adab71fc90499258732f321e79c27b7392075ccf/pallas-codec/src/utils.rs#L194-L209

Another example is Set, which doesn't expose its internal constructors, but still allow conversion from a plain vector without any maintained guarantees:

https://github.com/txpipe/pallas/blob/adab71fc90499258732f321e79c27b7392075ccf/pallas-codec/src/utils.rs#L705-L725

From what I see on some other structures, such as NonEmptySet, I believe there's a desire to have those invariants enforced, and I put it on the rapid and organic changes that happened in the crate due to hard fork constraints to not have enforced them across the board. In particular, many structure do distinguish between definite/indefinite CBOR, which introduces quite a lot of duplications and inconsistencies.

So I see room to perhaps unify this aspect with a more holistic and generic Def/Indef wrapper while avoiding repetition for each sub-type internals. This is, however, yet another breaking change.

At the very least, I would like to be more thorough about which constructors are exposed as well as the From/Into trait instances so that invariants are actually always enforced, ensuring that if one sees a given type with constraints, it comes with stronger guarantees.

pallas-primitives extensions ?

This is already partially discussed in #592, but there are various types from within pallas-network, pallas-traverse and pallas-addresses that would highly benefit from being moved down to pallas-primitives. Having them in pallas-network forces consuming crates to introduce undesired network-specific dependencies, and has led to inconsistency in several occasions (with similar types being defined in multiple places, though in a slightly different way).

From the top of my head, I can think of at least Point, PoolParams, but is generally true of any type that doesn't appear in a block but is typically found elsewhere (e.g. in a state-query protocol).

KtorZ avatar Feb 13 '25 14:02 KtorZ