nmt icon indicating copy to clipboard operation
nmt copied to clipboard

namespace: PrefixedData/PrefixedData8 are easy to misuse

Open elias-orijtech opened this issue 2 years ago • 1 comments

The PrefixedData and PrefixedData8 are both byte slices:

// PrefixedData simply represents a slice of bytes which consists of
// a namespace.ID and raw data.
// The user has to guarantee that the bytes are valid namespace prefixed data.
// Go's type system does not allow enforcing the structure we want:
// [namespaceID, rawData ...], especially as this type does not expect any
// particular size for the namespace.
type PrefixedData []byte

// PrefixedData8 like PrefixedData is just a slice of bytes.
// It assumes that the slice it represents is at least 8 bytes.
// This assumption is not enforced by the type system though.
type PrefixedData8 []byte

The definitions are problematic, because (1) casting byte slices may violate the type invariants (as pointed out in the comments) (2) they're mutable (3) PrefixedData has no way to indicate the namespace identifier length. For an example where mutability is a problem, see

https://github.com/celestiaorg/nmt/blob/62742433f65a37daf889ee563b0ee117c83e9664/nmt.go#L253-L257

where it is not clear to the caller that the parameter namespacedData is retained by the Push method.

The missing namespace length is problematic for validateAndExtractNamespace,

https://github.com/celestiaorg/nmt/blob/62742433f65a37daf889ee563b0ee117c83e9664/nmt.go#L336-L340

because it can only validate that the namespace+data total length is at least as long as the expected namespace length.

Perhaps the definitions of PrefixedData and PrefixedData8 are caused by wanting to support zero-copy and zero-allocation use, where the user generates PrefixedData values by slicing existing byte slices. If so, the behaviour of Push should document that argument values are retained.

Otherwise, I recommend changing the definition to hide the internal structure, and add validating constructors. There are several possibilites:

type PrefixedData struct {
    // namespace and data can be sliced from the same
    // underlying []byte allocation.
    namespace []byte
    data []byte
}

func NewPrefixedData(namespace, data []byte) PrefixedData {
    bytes := make([]byte, len(namespace)+len(data))
    d := PrefixedData{
        namespace: bytes[:len(namespace)],
        data: bytes[len(namespace)],
    }
    copy(d.namespace, namespace)
    copy(d.data, data)
    return d
}

or equivalently

type PrefixedData struct {
    namespaceAndData []byte
    namespaceLen int
}

@odeke-em

elias-orijtech avatar Oct 07 '22 20:10 elias-orijtech

I think both options existed in the past. See: https://github.com/celestiaorg/nmt/blob/dd6191cefbd34c84615afdab3859c6739c8c6e93/namespace/data.go#L3-L6 (I think I've never pushed the other version). I am not sure what the rationale was to make it just byte aliases. Probably because of the way it is consumed (so we do not have to convert back and forth between types maybe). Whoever tackles this should also check in core/app/node to see how this is used proper.

liamsi avatar Dec 16 '22 16:12 liamsi