amaranth icon indicating copy to clipboard operation
amaranth copied to clipboard

[RFC] Aggregate data structure library

Open whitequark opened this issue 3 years ago • 15 comments

Aggregate data structure library RFC

Summary

Add a rich set of standard library classes for accessing hierarchical aggregate data an idiomatic way, to fill one of the two major use cases of Record while avoiding its downsides.

See also #???.

Motivation

Amaranth has a single data storage primitive: Signal. A Signal can be referenced on the left hand and the right hand side of an assignment, and so can be its slices. Although a signal serves the role of a numeric and bit container type fine, designs often include signals used as bit containers whose individual bits are named and have unique meanings. A mechanism that allows referring to such bit fields by name is essential.

Currently, the role of this mechanism is served by Record. However, Record has multiple major drawbacks:

  1. Record attempts to do too much: it is both a mechanism for controlling representation (including implicitly casting a record to a value) and a mechanism for defining interfaces (specifying signal directions and facilitating connections between records).

    These mechanisms should be defined separately, since the only aspect they have in common is using a container class that consists of multiple named fields. Conflating the two mechanisms constraints the design space, making addressing the other drawbacks impossible, and the ill-defined scope encourages bugs in downstream code.

  2. Record has limited composability: records can only be nested within each other. Practical designs (in particular, implementations of protocols) use data with complex representation beyond nested sequences of fields: these include overlaid sequences of fields (where interpretation alternates based on some discriminant) and arrays of fields (where a field can be indexed by a non-constant), where any individual field can have a complex representation itself.

    Record is structured as a sequence of Signals, which is a part of its API. As such, it cannot support overlaid fields, and implementing support for arrays of fields is challenging.

  3. Record has limited introspectability: while its layout member can be accessed to enumerate its fields, the results do not include field boundaries, and the types of the returned shape-castable objects are preserved only as an implementation detail. Layout objects themselves are also not shape-castable.

    Record and Layout are structured as a sequence of Signals rather than a view into an underlying bit container, which is reflected in its API. Thus, Layout does not fit into Amaranth's data model, which concerns individual values.

  4. Record comes with its own storage: while its fields argument can be used to substitute the signals that individual fields are pointing to (in an awkward and error-prone way), it is still a collection of Signals. Using Record to impose structure on an existing Value requires a Module and a combinatorial assignment. This is an unnecessary complication, especially in helper functions.

  5. Record does not play well with Python's type annotations. Amaranth developers often inherit from Record as well as Layout, but in both cases the class definition syntax is usually little more than a way to define a callable returning a Record with a specific layout, and provides no benefits for IDE users.

  6. Record reserves a lot of names, including commonly used names like connect, any, all, and matches. Conversely, it defines a lot of arithmetic methods that are rarely if ever used on field containers.

  7. Layout's DSL is very amorphous. It passes around variable length tuples. The second element of these tuples (the shape) can be another Layout, which is neither a shape nor a shape-castable object.

  8. Neither Record nor Layout allow defining fields whose shapes are arbitrary ShapeCastable classes.

Since these drawbacks are entrenched in the public API and heavily restrict usefulness of Record as a mechanism for specifying data representation, a new mechanism must replace it.

Overview and examples

This section shows a bird's eye view of the new syntax and behavior proposed in this RFC. The detailed design is described afterwards.

from amaranth import *
from amaranth.lib import data


# Declaring a simple structure:
class Float32(data.Struct):
    sign: unsigned(1)
    exponent: unsigned(8)
    fraction: unsigned(23)


# Defining a signal with the structure above:
flt_a = Float32()

# Reinterpreting an existing value with the same structure:
flt_b = Float32(Const(0b00111110001000000000000000000000, 32))

# Referencing and updating structure fields by name:
with m.If(flt_b.fraction > 0):
    m.d.comb += [
        flt_a.sign.eq(1),
        flt_a.exponent.eq(127)
    ]


# Declaring a simple union, referencing an existing structure:
class FloatOrInt32(data.Union):
    float: Float32
    int: signed(32)


# Using the union to bitcast an IEEE754 value from an integer:
f_or_i = FloatOrInt32()
is_sub_1 = Signal()
m.d.comb += [
    f_or_i.int.eq(0x41C80000),
    is_sub_1.eq(f_or_i.float.exponent < 127) # => 1
]


class Op(enum.Enum):
  ADD = 0
  SUB = 1


# Programmatically declaring a structure layout:
adder_op_layout = data.StructLayout({
    "op": Op,
    "a": Float32,
    "b": Float32
})

# Using the layout defined above to define appropriately sized storage...
adder_op_storage = Signal(adder_op_layout)
len(adder_op_storage) # => 65

# ... and wrap it for the fields to be accessible.
adder_op = data.View(adder_op_layout, adder_op_storage)
m.d.comb += [
    adder_op.op.eq(Op.SUB),
    adder_op.a.eq(flt_a),
    adder_op.b.eq(flt_b)
]

Detailed design

This RFC proposes a number of language and library additions:

  • Adding a ShapeCastable interface, similar to ValueCastable;
  • Adding classes that hierarchically describe representation of aggregate values: named field containers with non-overlapping fields (structs), named field containers with overlapping fields (unions), and indexed field containers (arrays);
  • Adding a wrapper class that accepts a Value (or a ValueCastable object) and provides accessors that slice it according to the corresponding aggregate representation;
  • Adding an ergonomic and IDE-compatible interface for building descriptions of non-parametric layouts of aggregate values.

User-defined shape-castable objects

ShapeCastable is an interface for defining Shape-like values outside of the core Amaranth language. It is functionally identical to ValueCastable, and could be used like:

from amaranth import *


class Layout(ShapeCastable):
    def __init__(self, fields):
        self.fields = fields

    def as_shape(self):
        return unsigned(sum(len(field) for field in self.fields))

Value layout descriptions

Aggregate value layouts are represented using two classes: amaranth.lib.data.Field and amaranth.lib.data.Layout:

  • A Field(shape_castable, offset=0) object describes a field of the given shape starting at bit number offset of the aggregate value.
  • A Layout() object describes an abstract aggregate value. It can be iterated, returning (name, field) or (index, field) pairs; or indexed (__getitem__) by the name or index of the field. It has a .size in bits, determined by the type of the layout, and is shape-castable, being converted to unsigned(layout.size()).
    • A StructLayout(members={"name": shape_castable}) object describes an aggregate value with non-overlapping named fields (struct). The fields are placed at offsets such that they immediately follow one another.
    • A UnionLayout(members={"name": shape_castable}) object describes an aggregate value with overlapping named fields (union). The fields are placed at offset 0.
    • An ArrayLayout(element=shape_castable, length=1) object describes an aggregate value with indexed fields (array). The fields all have identical shape and are placed at offsets such that they immediately follow one another.
    • A FlexibleLayout(fields={"name": field, 0: field}, size=16) object describes a aggregate value with fields arbitrarily placed within its bounds.

The representation of a discriminated union could be programmatically constructed as follows:

import enum
from amaranth.lib import data


class Kind(enum.Enum):
    ONE_SIGNED = 0
    TWO_UNSIGNED = 1


layout = data.StructLayout({
    "kind": Kind,
    "value": data.UnionLayout({
        "one_signed": signed(2),
        "two_unsigned": data.ArrayLayout(unsigned(1), 2)
    })
})

Aggregate value access

Aggregate values are manipulated through the amaranth.lib.data.View class. A View(layout, value_castable) object wraps a value-castable object (which may be a valid assignment target) and provides access to fields according to the layout. A view is itself value-castable, being converted to the object it's wrapping. If the view is wrapping a valid assignment target, then the accessors also return a valid assignment target.

Fields can be accessed using either __getitem__ (for both named and indexed fields) or __getattr__ (for named fields). To avoid name collisions when using __getattr__ to access fields, views do not define any non-reserved attributes of their own except for the .as_value() casting method. Field names starting with _ are reserved as attribute names and and can only be accessed using the view["name"] indexing syntax.

When a view is used to access a field whose shape is an ordinary Shape object, the accessor returns a Value of the corresponding shape that slices the viewed object.

When a view is used to access a field whose shape is an aggregate value layout, the accessor returns another View with this layout, wrapping the slice of the viewed object. For fields that have any other shape-castable object set as their shape, the behavior is the same as for the Shape case.

Views that have an ArrayLayout as their layout can be indexed with a Value. In this case, the viewed object is sliced with Value.word_select.

A signal can be manipulated with its structure viewed as the discriminated union defined above as follows:

# creates an unsigned(3) signal by shape-casting `layout`
sig = Signal(layout)
view = data.View(layout, sig)

# if the second argument is omitted, a signal with the right shape is created internally;
# the line below is equivlent to the two lines above
view = data.View(layout)

m = Module()
m.d.comb += [
    view.kind.eq(Kind.TWO_UNSIGNED),
    view.value.two_unsigned[0].eq(1),
]

Ergonomic layout definition

Rather than using the underlying StructLayout and UnionLayout classes, struct and union layouts can be defined using the Python class definition syntax, with the shapes of the members specified using the PEP 526 variable annotations:

class SomeVariant(data.Struct):
    class Value(data.Union):
        one_signed: signed(2)
        two_unsigned: data.ArrayLayout(unsigned(1), 2)

    kind: Kind
    value: Value


# this class can be used in the same way as a `data.View` without needing to specify the layout:
view2 = SomeVariant()
m.d.comb += [
    view2.kind.eq(Kind.ONE_SIGNED),
    view2.value.eq(view.value)
]

When they refer to other structures or unions defined in the same way, the variable annotations are also valid PEP 484 type hints, and will be used by IDEs to derive types of properties and expressions. Otherwise, the annotations will be opaque to IDEs or type checkers, but are still useful for a human reader.

The classes defined in this way are shape-castable and can be used anywhere a shape or a aggregate value layout is accepted:

sig2 = Signal(SomeVariant)
layout2 = data.StructLayout({
    "ready": unsigned(1),
    "payload": SomeVariant
})

Implementation note: This can be achieved by using a custom metaclass for Struct and Union that inherits from ShapeCastable.

If an explicit Layout object is nevertheless needed (e.g. for introspection), it can be extracted from the class using Layout.cast:

layout == data.Layout.cast(SomeVariant) # => True

Conversely, the shape-castable object defining the layout of a View (which might be a Layout subclass or a Struct/Union subclass) can be extracted from the view using Layout.of:

SomeVariant is data.Layout.of(view2) # => True

Advanced usage: Parametric layouts

The ergonomic definitions using the Struct and Union base classes are concise and integrated with Python type annotations. However, they cannot be used if the layout of an aggregate value is parameterized. In this case, a class with similar functionality can be defined in a more explicit way:

class Stream8b10b(data.View):
    data: Signal
    ctrl: Signal

    def __init__(self, value=None, *, width: int):
        super().__init__(data.StructLayout({
            "data": unsigned(8 * width),
            "ctrl": unsigned(width)
        }), value)


len(Stream8b10b(width=1).data) # => 8
len(Stream8b10b(width=4).data) # => 32

Since the parametric class name itself does not have a fixed layout, it cannot be used with Layout.cast. Similarly, the type annotations cannot include specific field widths; they are included only to indicate the presence of a corresponding attribute to IDEs and type checkers.

Drawbacks

This feature introduces a language-level concept, shape-castable objects, increasing language complexity.

This feature introduces a finely grained hierarchy of 5 similar and related classes for describing layouts.

Alternatives

Do nothing. Record will continue to be used alongside the continued proliferation of ad-hoc implementations of similar functionality.

Remove ArrayLayout from this proposal. The array functionality is niche and introduces the complexity of handling by-index accessors alongside by-name ones.

Remove ArrayLayout, UnionLayout, and FlexibleLayout from this proposal. Their functionality is less commonly used than that of StructLayout and introduces the substantial complexity of handling fields at arbitrary offsets. (This would make amaranth.lib.data useless for slicing CSRs in Amaranth SoC.) This change would bring this proposal close to the original PackedStruct proposal discussed in https://github.com/amaranth-lang/amaranth/issues/342.

Combine the Layout and all of its derivative classes into a single Layout(fields={"name": Field(...), 0: Field(...)}) class that provides a superset of the functionality. This simplifies the API, but makes introspection of aggregate layouts very difficult and can be inefficient if large arrays are used. In this case, factory methods of the Layout class would be provided for more convenient construction of regular struct, union, and array layouts.

Remove Struct and Union annotation-driven definition syntax. This makes the API simpler, less redundant, and with fewer corner cases, also avoiding the use of variable annotations that are not valid PEP 484 type hints, at the cost of a continued jarring experience for IDE users.

Include a more concise and less visually noisy way to build StructLayout and UnionLayout objects (or their equivalents) using a builder pattern. This may make the syntax slightly nicer, though the RFC author could not come up with anything that would actually be such.

Bikeshedding

The names of the Field, *Layout, and View classes could be changed to something better.

  • IrregularLayout was renamed to FlexibleLayout.

Future work

This feature could be improved in several ways that are not in scope of this RFC:

  • StructLayout, UnionLayout, and ArrayLayout could be extended to generate layouts with padding at the end (for structs and unions) or between elements (for arrays). Currently this requires the use of an FlexibleLayout.
  • StructLayout could be extended to accept a list of fields in addition to a map of field names to values. In this case, it would represent an aggregate value with non-overlapping indexed fields (tuple).
  • Struct and/or StructLayout could be extended to treat certain reserved field names (e.g. "_1", "_2", ...) as designating padding bits. In this case, the offset of the following fields would be adjusted, and the fields with such names would not appear in the layout.
  • Struct and/or StructLayout could be extended to treat certain reserved field names (e.g. "_" for Struct and None for StructLayout) as designating an anonymous inner aggregate. In this case, the members of the anonymous inner aggregate would be possible to access as if they were the members of the outer aggregate.
  • The automatic wrapping of accessed aggregate fields done by View could be extended to call a user-specified cast function rather than hard-coding a check for whether the shape is a Layout. This would allow seamless inclusion of user-defined value-castable types in aggregates.
  • The PEP 484 generics could be used to define layouts parametric over field shapes, using type annotations alone. Since Python does not have type-level integers, layouts parametric over field sizes would still need to be defined explicitly.
  • The struct, union, and enum support could be used as the building blocks to implement first-class discriminated unions. Discriminated unions will also benefit from tuples, described above. (Suggestion by @lachlansneff.)

Acknowledgements

@modwizcode provided valuable feedback while this RFC was drafted.

whitequark avatar Apr 05 '22 17:04 whitequark

I think it is good to consider how this interfaces with code documentation with tools like Sphinx? Making sure that it is easy to document your data structure seems like a good goal?

mithro avatar Apr 05 '22 17:04 mithro

I think it is good to consider how this interfaces with code documentation with tools like Sphinx?

Sphinx will pick up the annotation and the docstring for it if you use autodoc. Here's a very quick and dirty test I made: Screenshot_20220405_180759

whitequark avatar Apr 05 '22 18:04 whitequark

I really like this.

I think that a rich-enum layout would be really great as well—something like a union with the tag built in. The data could be accessed with m.If or m.Switch like so:

# Maybe something like this
class Op(data.Enum):
    ADD: unsigned(16) = 0
    MUL_ADD: (unsigned(16), unsigned(16)) = 1

# accessing
with m.Switch(op):
    with m.Case(Op.ADD) as imm:
        …
    with m.Case(Op.MUL_ADD) as (mul_imm, add_imm):
        …

lachlansneff avatar Apr 05 '22 20:04 lachlansneff

I think that a rich-enum layout would be really great as well—something like a union with the tag built in.

I agree entirely! I think that discriminated unions (rich-enums, in your terms) would be an excellent fit, and I like your choice of syntax as well. I've added it to the "Future work" section.

There are several questions that need to be resolved for this proposal (which is why it should be a separate, future RFC). The ones that come to mind first are:

  • How do you assign to it? op.eq(Op.MUL_ADD, 16, 32)? In this case, how are the 16 and 32 values cast to the tuple that is the value of MUL_ADD variant?
  • Should this really be handled by Switch? Would a Match statement, able to recursively descend into aggregate types, be more appropriate? The syntax Case has, where the arguments to Case are a list of or-patterns, is perhaps not ideal for this. On the other hand, translating nested pattern matches is notoriously difficult (and that's not considering how hard it is to get the Python syntax to cooperate), and your Switch proposal is straightforward.
  • Switch is a part of the core language, data.Enum is not; do we want to introduce a new public API that allows branching on user-defined values?
  • data.View ~does not reserve any attribute names~ reserves only as_value and eq. What attribute names does data.Enum reserve? Can you access the discriminant using something like op.kind? How do you name the types of the variants, if it's possible at all?

whitequark avatar Apr 05 '22 21:04 whitequark

  • How do you assign to it? op.eq(Op.MUL_ADD, 16, 32)? In this case, how are the 16 and 32 values cast to the tuple that is the value of MUL_ADD variant?

I was thinking maybe op.eq(Op.MUL_ADD(16, 32)). Op.MUL_ADD is a type, in the Float32 sense, but Op.MUL_ADD(...) is a value in the same way as Float32() in the example. The arguments, if there's a tuple/list in the annotation, are splayed out.

  • Should this really be handled by Switch? Would a Match statement, able to recursively descend into aggregate types, be more appropriate? The syntax Case has, where the arguments to Case are a list of or-patterns, is perhaps not ideal for this. On the other hand, translating nested pattern matches is notoriously difficult (and that's not considering how hard it is to get the Python syntax to cooperate), and your Switch proposal is straightforward.

Match sounds really good. It'd be nice to be able to descend into nested types but, as you say, it would be quite difficult to design and implement.

  • Switch is a part of the core language, data.Enum is not; do we want to introduce a new public API that allows branching on user-defined values?

Switch can already branch on enum.Enum I believe. Not sure whether you mean we should be hesitant to introduce a new API or if we should think about doing it.

  • data.View does not reserve any attribute names. What attribute names does data.Enum reserve? Can you access the discriminant using something like op.kind? How do you name the types of the variants, if it's possible at all?

op.tag or op.kind makes sense to add in my opinion.

lachlansneff avatar Apr 05 '22 22:04 lachlansneff

op.tag or op.kind makes sense to add in my opinion.

That would work. Your code would define an enum Op.Tag with values Op.Tag.MUL_ADD and so forth, along with an op.tag: Op.Tag field. Then op.as(Op.Tag.MUL_ADD) would give you an Op.MUL_ADD.

I was thinking maybe op.eq(Op.MUL_ADD(16, 32)). Op.MUL_ADD is a type, in the Float32 sense, but Op.MUL_ADD(...) is a value in the same way as Float32() in the example.

Does the layout of Op.MUL_ADD contain the (constant) tag? If it does not, then data.Enum would need to somehow know what the tag is within .eq, which is a bad idea. If it does, then creating an Op.MUL_ADD should populate the tag properly; perhaps by creating a view over Cat(Op.Tag.MUL_ADD, viewed_value) when constructed from scratch.

Switch can already branch on enum.Enum I believe.

It does; enum.Enum is a built-in Python class though, so this doesn't create a circular dependency between amaranth.hdl and amaranth.lib.

whitequark avatar Apr 05 '22 22:04 whitequark

That would work. Your code would define an enum Op.Tag with values Op.Tag.MUL_ADD and so forth, along with an op.tag: Op.Tag field. Then op.as(Op.Tag.MUL_ADD) would give you an Op.MUL_ADD.

Does the layout of Op.MUL_ADD contain the (constant) tag? If it does not, then data.Enum would need to somehow know what the tag is within .eq, which is a bad idea. If it does, then creating an Op.MUL_ADD should populate the tag properly; perhaps by creating a view over Cat(Op.Tag.MUL_ADD, viewed_value) when constructed from scratch.

Hmm, this is starting to get too complex I think. Maybe Op.MulAdd shouldn't be a distinct type.

lachlansneff avatar Apr 05 '22 23:04 lachlansneff

These examples look very appealing !

How would per-field reset values be defined for aggregate signals ? (also: reset_less, attrs) Dealing with those is frustrating with the current Record API.

Intuitively, I'd be drawn to something consistent with the dir and xdr parameters of platform.request():


class JTAGInterface(data.Struct):
    tms: unsigned(1)
    tdi: unsigned(1)
    tdo: unsigned(1)

jtag = JTAGInterface(reset={"tms": 1, "tdi", 1})
jtag_layout = data.StructLayout({
    "tms": unsigned(1),
    "tdi": unsigned(1),
    "tdo": unsigned(1),
})

jtag_storage = Signal(jtag_layout)
jtag = data.View(jtag_layout, jtag_storage, reset={"tms": 1, "tdi": 1}))

jfng avatar Apr 05 '22 23:04 jfng

How would per-field reset values be defined for aggregate signals ? (also: reset_less, attrs)

reset_less and attrs are not possible to implement per-field since there is not a distinct Signal per-field in this design. More generally, if it's not data in storage (and a bunch of JTAG pins isn't), you shouldn't be using amaranth.lib.data to poke at it. That's one of the reasons why the upcoming Interfaces RFC should be handling this use case! Another is that interface fields always have a direction associated with them, which is appropriate for a collection of pins.

Regarding the customization of interface signals: for pins I think the current model, with request giving you a fully formed data structure, is falling far short of being useful, for many reasons: hard to customize signal properties such as attrs and reset; hard to split up a peripheral I/O block between submodules; hard to reflect on an I/O block; the place where xdr/dir="-" is set (toplevel, usually) is not the same where you know what xdr should be set to (the I/O frontend of the peripheral); all unused pins are always requested, occasionally causing constraint issues (with SerDes pins for example); and many others.

What should happen instead is that request would give you a proxy object that by itself doesn't instantiate anything and doesn't add any toplevel ports, but does own the pins you just requested. You can then slice up that proxy any way you want and pass it to the peripheral, which will configure the IOBs in the way it needs to without the rest of the code having anything to do with it.

whitequark avatar Apr 06 '22 00:04 whitequark

reset_less and attrs are not possible to implement per-field since there is not a distinct Signal per-field in this design. More generally, if it's not data in storage (and a bunch of JTAG pins isn't), you shouldn't be using amaranth.lib.data to poke at it. That's one of the reasons why the upcoming Interfaces RFC should be handling this use case! Another is that interface fields always have a direction associated with them, which is appropriate for a collection of pins.

That makes sense, I mixed-up these two use cases. I'll be looking forward to the Interfaces RFC.

jfng avatar Apr 06 '22 01:04 jfng

I've implemented the RFC in https://github.com/amaranth-lang/amaranth/pull/697. Please try it out; I'm happy to hear any and all feedback!

whitequark avatar Apr 06 '22 04:04 whitequark

Nitpick: in the code in the Overview and examples section I think adder_op.eq(Op.SUB) needs to be adder_op.op.eq(Op.SUB)

FatsieFS avatar May 04 '22 16:05 FatsieFS

First off: Thanks for all the hard work on Amaranth! This RFC looks good already and would help a lot with the mess and inflexibility that Record creates.

However, while playing around a bit with the code by implementing a simple image encoder, I've bounced on three topics: The code for the encoder can be found here as a gist.

  1. Are the data structure fields meant to be inconsistent? Looking at the examples, it looks like when using data.Struct in a class, the order in which the fields of the class are defined matches with their representation. With that I mean the first field is the Most Significant Bit, while the last defined field is the Least Significant Bit. As defined by the RFC:

# Declaring a simple structure:
class Float32(data.Struct):
    sign: unsigned(1) #MSB
    exponent: unsigned(8) 
    fraction: unsigned(23) #LSB

However playing around with the RFC that does not seem to always be the case. In the gist linked above, the fields of the data structure classes (RGBA and Chunk) don't match 1-to-1 with the order the fields are defined. While RGBA looks like we would expect (MSB is R field, then G, then B and LSB is A field), the Chunk class doesn't seemingly work like that. Since the tag (0xFF) is the LSB, and the Data is the MSB. Switching the order of the fields around in the class declaration does swap the representation. So is the field random, but consistent? Does the name have anything to do with it? Does the field width determine this (32 bits is more than 8 bits, so on top)?

  1. Is using a data.struct (layout) in an elaboratble as input/output even encouraged? The aggregate data structure RFC is meant to model data in storage and not an interface (which will be for the upcoming interfaces RFC). Should we (for now, pretending that Record also doesn't exist) only use Signals() as input and output for elaboratable classes? And then, for the moment wrap these with a data.View for a certain layout?

  2. In the same vein, how would you (implement a) reset for data.struct? This was already mentioned previously by jfng, but I don't quite see how having a certain reset value of that data relates to an interface. I'm struggling with this use-case while making the encoder. It has to keep track for the previous value of a pixel, but for the first pixel it processes there is no previous pixel value. And the spec defines it as having a fixed non-zero value for the start. (See line 54 of the gist). I can model around it by defining it as a Signal first of course, but I wonder if that is the intended/suggested solution for this.

And again, thanks a lot for all the hard work you folks are doing!

Kaucasus avatar Jun 10 '22 10:06 Kaucasus

I gave it a quick try to see if it would solve the issues I have with how Record presents in the waveform viewer (namely that only the field name is shown so if I have multiple instances of the same record it gets messy). So from the example above if we have

flt_a = Float32()

then it shows up in gtkwave as flt_a[31:0] while I think that at least from a debug-ability perspective it would have been more useful to present as say

flt_a__sign
flt_a__exponent[7:0]
flt_a__fraction[22:0]

Edit: After having used the aggregate data structure library for a while in a project (a toy out-of-order RISC-V) I found that manually adding signals for debug-ability of an aggregate worked well enough for my needs.

def addDebugSignals(mod, sig):
  for a, b in sig._AggregateMeta__layout._fields.items():
    dbgSig = Signal(b.shape, name='{}${}'.format(sig._View__value.name, a))
    mod.d.comb += dbgSig.eq(Value.cast(sig)[b.offset:b.offset + b.width])

While library integration of something similar would be nice it is probably hard to make generic enough to fit all.

Thanks for all the work put into the aggregate data structures library. Looking forward to see it land!

markus-zzz avatar Jun 18 '22 20:06 markus-zzz

Could struct constructors let one set the initial field values with named arguments? e.g.

# Declaring a simple structure:
class Float32(data.Struct):
    sign: unsigned(1)
    exponent: unsigned(8)
    fraction: unsigned(23)


# Defining a signal with the structure above:
flt_a = Float32(sign=0, exponent=5, fraction=0x12345)

lachlansneff avatar Aug 14 '22 19:08 lachlansneff

@Kaucasus

  1. Are the data structure fields meant to be inconsistent?

No.

Looking at the examples, it looks like when using data.Struct in a class, the order in which the fields of the class are defined matches with their representation. With that I mean the first field is the Most Significant Bit, while the last defined field is the Least Significant Bit. As defined by the RFC:

This is actually an omission in the RFC. As it happens elsewhere in Amaranth, such as in concatenations, the first field starts with the LSB, and the last field ends with the MSB. In other words, the following test passes:

from amaranth import *
from amaranth.lib import data
from amaranth.sim import Simulator


class RGBA(data.Struct):
    r: unsigned(8)
    g: unsigned(8)
    b: unsigned(8)
    a: unsigned(8)


rgba1 = Signal(32)
rgba2 = RGBA()

m = Module()
m.d.comb += [
    rgba1.eq(Cat(C(1, 8), C(2, 8), C(3, 8), C(4, 8))),
    rgba2.r.eq(C(1, 8)),
    rgba2.g.eq(C(2, 8)),
    rgba2.b.eq(C(3, 8)),
    rgba2.a.eq(C(4, 8)),
]


def process():
    assert (yield rgba1) == (yield rgba2.as_value())


sim = Simulator(m)
sim.add_process(process)
sim.run()

However, this is never mentioned in the RFC, and in fact the example introducing 32-bit floats has the fields in reversed order! I seem to have fixed this in the implementation but not in the RFC itself. Thanks for pointing this out! I've updated the RFC.

  • Should we (for now, pretending that Record also doesn't exist) only use Signals() as input and output for elaboratable classes? And then, for the moment wrap these with a data.View for a certain layout?

Yes. The Interfaces RFC will automate this.

3. In the same vein, how would you (implement a) reset for data.struct? This was already mentioned previously by jfng, but I don't quite see how having a certain reset value of that data relates to an interface.

On second thought I agree that it would be useful to be able to specify a reset value (specifically) for aggregate data. This will require additional functionality to be able to construct Layouts out of integers, but I think this can be added.

@markus-zzz

I found that manually adding signals for debug-ability of an aggregate worked well enough for my needs.

Yes, this is the suggested solution. The chief issue here is that the fields of an aggregate may overlap, e.g. when you're defining a union, which is why it is internally represented by a single Signal. This means that the wires created for fields would have to be debugging-only. This can be done in the backend, but for now I think it is reasonable to punt on this to the designer. I agree there is room for improvement here.

@lachlansneff

I think it is necessary to use a separate keyword argument like reset= to avoid naming collisions and for parity with Signal:

flt_a = Float32(reset={"sign": 0, "exponent": 5, "fraction": 0x12345})

But yes, otherwise I agree.

@jfng

If we introduce reset= then we might as well introduce reset_less= and everything else from Signal that makes sense here.

whitequark avatar Sep 24 '22 08:09 whitequark

I've updated the RFC to mention View accepting Signal keyword arguments. I've also updated the implementation.

I believe this RFC should be complete now.

whitequark avatar Sep 24 '22 10:09 whitequark

Please continue the discussion in the RFC PR: https://github.com/amaranth-lang/rfcs/pull/1.

whitequark avatar Feb 14 '23 11:02 whitequark