go icon indicating copy to clipboard operation
go copied to clipboard

structs: add HostLayout "directive" type

Open dr2chase opened this issue 2 years ago • 47 comments

Proposal Details

Abstract

This proposes a new package for zero-sized types whose presence in a structure’s list of fields would control how the compiler lays out those fields, for the purpose of allowing programmers to indicate which structures are interchanged with the host platform and to request a host-compatible layout for those structures.

Background

While the Go language specifies very little about struct layout, in practice the Go implementation is tightly constrained to follow platform layout and alignment rules because of the few cases where a struct is interchanged with a platform API (and where this is not true it creates the possibility of incompatibility, for example, ppc64le, where the platform alignment for float64 fields is different from Go's default). This forces tradeoffs or potential problems on platforms whose constraints differ from common-case on other platforms (that is, what the Go compiler has adopted as its default) and prevents field reordering optimizations that can save memory and improve garbage collection performance.

Proposal

To address this, we propose a family of zero-sized types for struct fields to signal differences in layout or alignment where that matters. The change in the compiler’s behavior should be invisible to pure Go programs that do not use unsafe or interact with the host platform. The goal of the proposal is that programmers be able to ensure that data exchanged with the host platform have a host-compatible layout, both now, and in the face of future layout optimizations.

Subject to discussion, the proposal is this package and (for now) this one type:

package structlayout 

// Platform, as a field type, signals that the size, alignment,
// and order of fields conform to requirements of the GOOS-GOARCH
// target platform and may not match the Go compiler’s defaults.
type Platform struct {}

After reflecting on the discussion below, I would modify this to:

package structs 

// HostLayout, as a field type, signals that the size, alignment,
// and order of fields conform to requirements of the host
// platform and may not match the Go compiler’s defaults.
type HostLayout struct {}

the rationale for the name change is that structs is one word, and parallels strings, bytes, and slices, and is generic enough to include other (future) tags specifying "nocopy" or alignment. Furthermore, such type-modifying tags only work within structures; the package name strongly hints at this.

Rationale

One platform, WASM, has system interfaces that align 64-bit types to more than register size and another, ppc64le, has the possibility of non-Go interfaces that align some 64-bit types to less than register size, and both of these are contrary to the rules that Go normally follows (on ppc64le, we have handled this problem using luck). Signaling these constraints explicitly will help compatibility with these two platforms, preserve/allow implementation flexibility, perhaps make it easier to write checking tools, and perhaps (once types passed to all non-Go calls are properly tagged) allow the Go compiler to reorder structures to use less memory and save GC time by shuffling pointers as early as possible in untagged structures. This optimization is desirable because it automates something humans currently spend time on and don't always get right, and sometimes forces programmers to make compromises between most-readable code and best performance.

The most important part of this proposal is that unless someone is writing code that interacts with the platform, they do not need to know about this. If they are writing cgo, these signal types will be inserted for them.

The compiler will know the meaning of these types and modify struct alignment and layouts accordingly. It’s not clear to me whether Platform is adequate to capture all the cases of non-Go code, but for the current use cases (platform interfaces across all the various platforms, and cgo -- as far as I know “platform” describes their needs) it appears that it is.

Why signal types versus //go:platform ?
It is a better match for the Go type system if changes in types are expressed in the type system itself. Use of field signal types meets this requirement, since the Go type of a struct depends on the fields of the struct, even if they have zero width.

Why just one platform tag instead of finer control?
In practice, the use case is platform compatibility, and platform is a concept that the compiler can translate to the appropriate ARCH-OS combination without demanding that the user know the details, and those details also might not be portable across platforms even when the C type declarations are the same.

In the future, we could consider adding signal types for CacheAlign, AtomicAlign, or Packed but I would not include those at first because I'm not that sure we need them, we might argue about definitions, and their implementation (for Packed, at least) would be somewhat more costly. A non-layout signal type that might work well is “NoCopy” to indicate to vet that a type should not be copied once it has a non-zero value (this is currently implemented by vet knowing that certain types are “special”).

Related: “proposal: spec: define/imply memory layout of tagged struct fields #10014”. This was a very similar proposal, approaching the problem from a slightly different direction, but did not address the issue of "the platform does not match Go's defaults". The new proposal here is more concrete in “how”, includes tweaking alignments to conform to platform constraints, but does not expect someone using the platform tag to know precisely what rules a particular platform uses.

Related: “proposal: runtime: add AlignedN types that can be used to increase alignment #19057”. This was a proposal for a family of types for specifying specific alignments, perhaps of specific fields. That proposal had additional use cases -- specifying higher alignment for various fields -- but also did not address the problem of reduced platform alignment (e.g., ppc64le float64) and its application to specific platform interfaces would require that programmers know the details of that platform’s layout rules (instead of the Go compiler/runtime knowing those details once).

Related: “proposal: cmd/compile: make 64-bit fields be 64-bit aligned on 32-bit systems, add //go:packed directive on structs #36606”. This proposal took the opposite approach -- 64-bit atomics require 64-bit alignment on 32-bit processors, therefore Go should change its default layout, rather than signaling specific types that needed this alignment. It also included a secondary proposal for “packed” types that had a far more annoying implementation burden (how is the pointer addressed in a “packed” struct {uint8; *int}? How does the GC find this pointer?)

Compatibility

Working old code will continue to work properly.

Implementation

Besides the proposed package and type, cmd/compile/internal/types/size.go will need adjusting to follow the signal types. It already contains special case code for sync/atomic/align64, so this is not outlandish.

Open issues

The names. For example, “structlayout”, versus “typelayout”? If we decide that this is a good place for 0-width signal types, some of them (NoCopy) aren’t about type layout which means whatever-layout isn’t quite right.

dr2chase avatar Mar 19 '24 18:03 dr2chase

The names. For example, “structlayout”, versus “typelayout”? If we decide that this is a good place for 0-width signal types, some of them (NoCopy) aren’t about type layout which means whatever-layout isn’t quite right.

How about typetag, e.g. typetag.Platform and typetag.NoCopy?

mvdan avatar Mar 19 '24 18:03 mvdan

Working old code will continue to work properly.

Will layout changes be guarded behind the module's Go version? I would find that important for the unsafe case.

dominikh avatar Mar 19 '24 18:03 dominikh

Will layout changes be guarded behind the module's Go version? I would find that important for the unsafe case.

This proposal is just part 1, adding the structlayout.Platform type and implementing its semantics. That is completely backwards-compatible, if you never see that type everything works as before.

Part 2, actually changing the layout of unadorned structs, is not part of this proposal. Of course, this proposal has less motivation if part 2 never happens.

randall77 avatar Mar 19 '24 19:03 randall77

Being part of the type system is a bit weird. Can i throw it in in a interface{}? What happens when i take a pointer to it and it's set to nil. Can i make it generic with type parameters?

nemith avatar Mar 20 '24 00:03 nemith

@nemith All of those will work fine. This proposal just changes the layout of the fields within a struct. Field layout is already fully described by the reflect.Type value associated with the type. Nothing else cares.

ianlancetaylor avatar Mar 20 '24 02:03 ianlancetaylor

@nemith Part of the reason for putting it into the type system is so that all the other Go tools understand it, from the point-of-view of type comparison, identity, etc.

dr2chase avatar Mar 20 '24 09:03 dr2chase

If/when additional signal types get added, what are the semantics of including more than one in the same struct? Will it be required that all signal types have orthogonal semantics, will it be a compile time error if they are incompatible? What if they are only incompatible on one platform/OS pair, would there be a vet check to alert someone that doesn't explicitly try that combination to the potential issue?

ChrisHines avatar Mar 21 '24 19:03 ChrisHines

@ChrisHines

I see no reason to require orthogonal semantics ("PlatformLayout" overlaps with "DeclaredLayout", which is not yet proposed but I can imagine it) but I do think that incompatible combinations should be diagnosed at compile time. And looking at a plausible interaction with alignment specification (the other/next layout tag I expect to someday see) I can construct plausible examples that would use both.

I think for this particular tag it would be reasonable to require that it precede any other field that has non-zero size.

My goal is to have as few of these tags as possible, motivated by real problems, so hopefully there will not be many combinations that apply, I think it is fine for the compiler to reject any combinations that are problematic. Even one of these tags should be a niche case; two should be niche-squared.

HOWEVER:

This might get messy for special C-hardware-specific types, for example, those used to talk about xmm and ymm registers, that currently have no Go equivalent. There are several approaches to that problem and I am not sure which is best; the existence/names of the very-wide data types is platform-dependent for C compilers, but Go could decide to just generally support 128 and 256-bit integers. Or, we could add per-field type tags for alignment that would precede 128-bit or 256-bit fields. So, something like:

type Ymms struct {
    _ typetag.PlatformLayout
   Inactive bool
   Reg  [32]uint256 // These are aligned to a platform-appropriate boundary
}

or

type YmmPart uint64
type Ymms struct {
    _ typetag.PlatformLayout
   Inactive bool
    _ typetag.Align256 // I hope the right alignment is 32-bytes / 256 bits.
    Reg  [32][4]YmmPart
}

(I added the boolean field just to make it clear that the hypothetical Align256 tag precedes a particular field.)

I prefer the choice where the programmer doesn't need to go read documentation to figure out what the C compiler is doing. On the other hand, after quickly checking what the internet to see what the YMM alignment rules are (and discovering a mess with annoying special cases), I can understand needing to be able to specify a platform order yet also be very picky about the alignment. Because of that, I think that specifying both platform layout and specific (increased) alignment for certain fields should be allowed. Specifying reduced alignment is probably a compile-time error, certainly taking the address of a field with reduced alignment is a compile-time error.

(Why is taking the address of a reduced-alignment field a likely error? The compiler would prefer to use the fast, assume-that-integers-are-aligned, instructions for dereferencing a *int64, and most people want the compiler to do that because it is faster. Taking the address of an unaligned field breaks that assumption.)

I am not sure if this is right for a vet check or not; vet would need to know a lot about different architecture and OS combinations and their C compilers. A different and slightly interesting question is what should happen if a struct tagged with "platform" layout contains a type that is inherently Go-oriented, like slice or map, or one for which Go has its own alignment assumptions. My inclination is to say that right now vet isn't checking any of this; platform interchange types are already a niche, and weird combinations of type tags (that don't exist yet) and/or Go types is a hypothetical niche of that niche.

dr2chase avatar Mar 25 '24 12:03 dr2chase

@mvdan proposed improved naming, included at the top for anyone coming upon this later and not wanting to slog through comments:

After reflecting on the discussion below, I would modify this to:

package structs 

// PlatformLayout, as a field type, signals that the size, alignment,
// and order of fields conform to requirements of the GOOS-GOARCH
// target platform and may not match the Go compiler’s defaults.
type PlatformLayout struct {}

the rationale for the name change is that structs is one word, and parallels strings, bytes, and slices, and is generic enough to include other (future) tags specifying "nocopy" or alignment. Furthermore, such type-modifying tags only work within structures; the package name strongly hints at this.

dr2chase avatar Mar 28 '24 18:03 dr2chase

Platform is a bit odd since Go is a platform too, and "the GOOS-GOARCH target platform" sounds like Go too. What about structs.HostLayout, and don't mention GOOS-GOARCH in the docs? (For the record, the main need for this is to write structs that match Windows and WASM, not Cgo. Cgo can always do something magical and unexposed.)

rsc avatar Apr 03 '24 17:04 rsc

Would this also affect the alignment of the struct on the stack?

ydnar avatar Apr 04 '24 06:04 ydnar

@ydnar - it depends. On some architectures the stack is not very aligned, and so extra-aligned data is heap-allocated instead. Otherwise, yes, probably.

And I am fine with HostLayout, will edit the top proposal to reflect this.

dr2chase avatar Apr 04 '24 16:04 dr2chase

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc avatar Apr 04 '24 21:04 rsc

Have all remaining concerns about this proposal been addressed?

The proposal is to add

package structs
type HostLayout struct{}

that can be added as a field named _ in a struct. This would have no significant effect in most tools, but a compiler could use it as a hint about laying out the struct. Of course there is an effect for type equality, since a struct with one of these fields is different from a struct without, but that’s exactly what we want if the compiler is using it to hint a different layout.

rsc avatar Apr 10 '24 18:04 rsc

Would like to additionally require that this directive-typed field must appear before any field with greater-than-zero size, if that's okay? (This should not be requirement for some imagined other directives, e.g., explicit next-field alignment.)

dr2chase avatar Apr 12 '24 14:04 dr2chase

What advantage do we get from imposing such a requirement?

ianlancetaylor avatar Apr 12 '24 18:04 ianlancetaylor

Simplifies the implementation ever-so-slightly (avoids a pre-pass over structure fields), also makes its use more uniform, and I don't see much harm in the restriction (which could be relaxed later if I turn out to be wrong in "not much harm").

dr2chase avatar Apr 14 '24 18:04 dr2chase

My take on it is that the advantages we get from the restriction aren't worth the cost of complicating the spec by adding the restriction.

ianlancetaylor avatar Apr 14 '24 18:04 ianlancetaylor

Would the field be zero sized if it's the last _ field in a struct?

ydnar avatar Apr 14 '24 23:04 ydnar

@iant I am okay with that also.

@ydnar _ fields can have width. A zero-width (field) type is either an array of zero elements, a struct of zero fields, or a struct/array built entirely of zero width types.

dr2chase avatar Apr 15 '24 17:04 dr2chase

Change https://go.dev/cl/578355 mentions this issue: cmd/compile: layout changes for wasm32, structs.HostLayout

gopherbot avatar Apr 16 '24 16:04 gopherbot

@ydnar _ fields can have width. A zero-width (field) type is either an array of zero elements, a struct of zero fields, or a struct/array built entirely of zero width types.

I’m referring specifically to https://github.com/golang/go/commit/6f07ac2f280847ee0346b871b23cab90869f84a4:

cmd/gc: pad structs which end in zero-sized fields

For a non-zero-sized struct with a final zero-sized field,
add a byte to the size (before rounding to alignment).  This
change ensures that taking the address of the zero-sized field
will not incorrectly leak the following object in memory.

reflect.funcLayout also needs this treatment.

Fixes https://github.com/golang/go/issues/9401

ydnar avatar Apr 18 '24 18:04 ydnar

@ydnar Yes, that would kick in if you choose to put this field last in a struct. I don't see any reason to treat it differently. For all purposes other than it's special purpose, it's just a field.

ianlancetaylor avatar Apr 18 '24 21:04 ianlancetaylor

Change https://go.dev/cl/581316 mentions this issue: cmd/compile: wasm32-specific structs.HostLayout changes

gopherbot avatar Apr 23 '24 21:04 gopherbot

Would the field be zero sized if it's the last _ field in a struct?

We should probably document that "By convention, this field should be placed first in a struct." That's a good convention to have regardless.

@cherrymui pointed out that, while it's important that an addressable zero-sized field at the end of a struct must have non-zero size, that we could carve out zero-sized _ fields from this rule. Algorithmically, we would first strip zero-sized _ fields at the end of the struct, and then append a byte if the remaining final field is zero-sized. This is an implementation detail, and is something we could decide to implement in the future. It would remove a minor foot-gun.

aclements avatar Apr 24 '24 17:04 aclements

@cherrymui pointed out that, while it's important that an addressable zero-sized field at the end of a struct must have non-zero size, that we could carve out zero-sized _ fields from this rule.

This seems to imply a change to reflection: https://go.dev/play/p/TfYcdEwAACm

zephyrtronium avatar Apr 24 '24 17:04 zephyrtronium

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

The proposal is to add

package structs
type HostLayout struct{}

that can be added as a field named _ in a struct. This would have no significant effect in most tools, but a compiler could use it as a hint about laying out the struct. Of course there is an effect for type equality, since a struct with one of these fields is different from a struct without, but that’s exactly what we want if the compiler is using it to hint a different layout.

rsc avatar Apr 24 '24 19:04 rsc

This seems to imply a change to reflection: https://go.dev/play/p/TfYcdEwAACm

Oh, bother. (Summary of the code: reflect lets you reference an _ field, take it's "address", and then get that as an unsafe.Pointer, making this field, in effect, addressable. This, despite the fact that it's an unexported, unnamable field.)

aclements avatar May 07 '24 13:05 aclements

Oh, bother. (Summary of the code: reflect lets you reference an _ field, take it's "address", and then get that as an unsafe.Pointer, making this field, in effect, addressable. This, despite the fact that it's an unexported, unnamable field.)

Maybe we want to disallow that? I would think reflection allows what one can do in the language spec, but not more than that. That will be a separate proposal, though.

cherrymui avatar May 07 '24 15:05 cherrymui

Sorry to come late to the party but there are a few other directives we should consider to add to this structs package: namely

type NoUnkeyedLiterals struct{}
type DoNotImplement interface{
    ProtoInternal(DoNotImplement) 
}
type DoNotCompare [0]func()
type DoNotCopy [0]sync.Mutex

These are actually pretty common since the Go protobuf code uses this kind of pseudo directives: https://github.com/protocolbuffers/protobuf-go/blob/master/internal/pragma/pragma.go So I think they will pass the bar of being common enough to be included in the standard library.

bjorndm avatar May 08 '24 13:05 bjorndm