go
go copied to clipboard
proposal: spec: direct reference to embedded fields in struct literals
(Status: referred from LanguageChange to Proposal. --adonovan)
Consider
type E struct {
A int
}
type T struct {
E
}
This works:
T{E: E{A: 1}}
This does not:
T{A: 1}
Makes some struct literals more verbose than they need be, and makes them asymmetrical to their usage (where you can access the embedded struct's fields directly).
Can we allow it?
(cc @bradfitz)
One possible argument against is that it allows you to effectively specify the same field twice, for example:
T{E: E{A: 1}, A: 2}
But this case already exists
S{B: 2, B: 3}
and is disallowed, so maybe that isn't worth worrying about.
FWIW, @adg and I both tripped over this independently. We both assumed T{A: 1} would work.
I've wished to be to able to do this many times (go/ast is a prime example).
I don't know that we ever thought through all the consequences, but it's perhaps worthwhile considering.
Is there any known reason it wasn't considered back in #164?
@dominikh issue #164 was about a "programmer error" or misunderstanding of the spec as written.
Usually we don't look at each such error and consider a spec change. However this has come up before (it has restricted what I wanted to do in APIs significantly) so maybe it's time to at least investigate the consequences of such a language change more thoroughly.
That said, language changes are really very low priority. We'll get to it when we get to it.
r's comment on this on golang-nuts was:
It might one day, but as it stands the requirement to provide more information is more robust against changes in the data types.
I have also written code that assumed this would work.
Considering the reading and assignment of embedded fields work it is quite unintuitive that this doesn't.
https://go.dev/play/p/wad3pxSez5I
@neild and I were discussing this and there are two things that would have to be addressed (not showstoppers, just considerations to make):
- What happens if both the anonymous field containing the embedded field and the embedded field itself are specified?
T{E: E{}, A: 42}
This is probably a compile error, much like specifying duplicate fields in struct literals today.
- What happens if the anonymous field is a pointer type? Should setting an embedded field cause the anonymous field to be implicitly allocated?
I also suspect that this could be done before Go 2, since it makes previously invalid Go programs valid, and shouldn't alter the meaning of existing Go programs.
How much should we be consistent with assignments? If 1 is a compile error should the following also be a compile error?
https://go.dev/play/p/ZVzskYi0ZlV
For 2, in assignments you would get a runtime error. I feel the idea of allocation on the fly can obscure memory allocation and if you have multiple nested embedding a simple int assignment would do a lot more than a user would expect and surprise them. So I feel it might be better to just have a runtime error as in the case of assignment and no hidden memory allocation:
https://go.dev/play/p/beA1MwFh9Y4
Also that is the behavior without embedding:
https://go.dev/play/p/5VyVof6WnC9
The assignment rules already cover the case where the same value (or part of a value) appears multiple times in an assignment.
The assignment proceeds in two phases. First, the operands of index expressions and pointer indirections (including implicit pointer indirections in selectors) on the left and the expressions on the right are all evaluated in the usual order. Second, the assignments are carried out in left-to-right order.
What should happen for the case that @zombiezen mentions above, in which a field is embedded via a pointer? Here are some possibilities:
- silently allocate the pointer
- panic at run time with a nil dereference error
- give a compilation error
It seems like choice 3 is the best one. The compiler can always tell how a field is embedded, so it can tell that a field is embedded via a pointer.
Is it too confusing for struct literals to permit direct embedded fields but not fields embedded via pointers?
1 or 3 are both valid options. 2 doesn't make sense given that 3 is a valid option.
3 would incentivize non-pointer embedding to get the simpler behavior even when a pointer embedding might make more sense.
1 seems more uniform and hence easier to use. Composite literals aren't necessary. They exist to make things easier.
I'd like to mention the curious case of an embedded, unexported pointer field with exported fields of its own:
package pkg
type u struct { A int }
type S struct { *u }
Currently, outside of pkg there is no way to allocate S.u or initialize A, although s.A is legal where var s pkg.S.
If pkg.S{A: 1} allocates, then you can do something that you couldn't before.
I don't know if there's any code out there that would care (that depends for its correctness on the inability to allocate u outside the package). There might be.
Hmm, that is a strong argument for option 3
Given a struct type T with arbitrary embeddings (directly, or indirectly), if the following code is valid:
var x T
x.f1 = v1
x.f2 = v2
...
for (possibly embedded and possibly exported) struct fields f1, f2, ..., and corresponding values v1, v2, ..., it seems that it should be valid to write:
T{f1: v1, f2: v2, ...}
and vice versa.
This rule alone would take care of visibility across packages and potential field name ambiguities. It would also explain what happens with pointer indirections (it will panic).
This is perhaps the simplest and most intuitive rule and would permit code transformation from one style to the other without change of semantics (which is currently the case for the restricted form of struct literals).
If the compiler is permitted to report an error when a struct literal is known to panic (which is easy to decide) then using a struct literal would be the "safer" choice. But one would lose the ability to transform code between the two variants without semantic change. I don't know if that's a good trade-off.
You could also only implicitly create the pointer if you could explicitly create it. In @jba's example you couldn't write pkg.S{A: 1} because you couldn't write pkg.S{u: &pkg.u{A: 1}}
Possible alternatives:
T{E{A: 1}}
T{E.A: 1}
If we implicitly allocating embedded pointers, if you have a long chain of embedded pointers, then it's non-obvious that:
x := Foo{Value: v}
is implicitly expanding to:
x := Foo{Bar: &Bar{Baz: &Baz{Spam: &Spam{Egg: &Egg{Value: v}}}}}
I don't think I like having non-obvious allocations like that.
@griesemer, your rule seems obvious (in hindsight). Could you go further and say that the sequence of assignments is actually the meaning of the struct literal? Interestingly, the spec never really gives the exact meaning of a struct literal. It strongly suggests it by saying keys are field names and values must be assignable to their respective fields. It even says omitted fields get the zero value. But it never actually says that the given values are assigned to the fields.
How much should we be consistent with assignments? If 1 is a compile error should the following also be a compile error? https://goplay.googleplex.com/p/9CL_5Owvzi
For 2, in assignments you would get a runtime error. I feel the idea of allocation on the fly can obscure memory allocation and if you have multiple nested embedding a simple int assignment would do a lot more than a user would expect and surprise them. So I feel it might be better to just have a runtime error as in the case of assignment and no hidden memory allocation: https://goplay.googleplex.com/p/HWSFemHcFf Also that is the behavior without embedding: https://goplay.googleplex.com/p/ixVAW3xa-Q
@ghasemloo The links you've provided are not public 🛡️ not sure if that was intentional?
Consider
type E struct { A int } type T struct { E }This works:
T{E: E{A: 1}}This does not:
T{A: 1}
If E is generic using the current generics proposal, then there is nowhere to specify the type on the above proposed literal of T.
https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md#embedded-instantiated-type
If E is generic the example might look like this:
type E[T any] struct {
A T
}
type T1 struct {
E[int]
}
type T2[T any] struct {
E[T]
}
that is, either E is instantiated with a fixed type such as int in T1, or is is instantiated with another type parameter T as in T2.
In these cases the equivalent struct literals would be:
T1{E[int]{A: 1}}
T2[int]{E[int]{A: 1}}
and, per this proposal, it could be:
T1{A: 1} // we statically know the type of A from the declaration of E[int]
T2[int]{A: 1} // we statically know the type of A from the T2 type argument
So this should be doable even in the presence of generic embedded fields.
Unrelated, please note that
T1{E: E[int]{A: 1}}
T2[int]{E: E[int]{A: 1}}
doesn't work at the moment in the generics prototype, even though it should. See also #44345 .
Amusingly your experience today was the reverse of mine. I separately discovered #44345 in the experimental playground and then came here wondering how it impacts this issue.
I can't seem to find this proposal in the proposals project. Is it missing from there or is it following another review process?
Language changes like this one mostly follow a separate process, tracked in #33892.
This issue is on that list but it hasn't gotten much attention recently.
Wanted to echo support of this! Currently there are places I have to avoid using an embedded struct to de-duplicate logic, due to dev UX overhead for downstream users w/ nested struct construction.
Wanted to echo support of this!
👍 Agreed, this would make promoted fields much more intuitive. I thought this was already possible but was surprised to learn that embedded fields are directly gettable but not directly settable on instantiation.
Struct literals (as opposed to var field assignments), are sometimes bringing better readability due to go fmt alignment and less "unnecessary" references to parent var.
type My struct {
A string
Foo int
BarBaz bool
AnotherGroup1 int
AnotherGroupQuux string
}
type myEmb struct {
A string
Foo int
BarBaz bool
}
type Mye struct {
myEmb
AnotherGroup1 int
AnotherGroupQuux string
}
// Better readability due to vertical alignment.
m1 := My{
A: "abc",
Foo: 123,
BarBaz: true,
// Another group for indentation.
AnotherGroup1: 3,
AnotherGroupQuux: "c",
}
// Worse readability due to lack of alignment and also `m2` all over the place.
m2 := Mye{}
m2.A = "abc"
m2.Foo = 123
m2.BarBaz = true
m2.AnotherGroup1 = 3
m2.AnotherGroupQuux = "c"
I'd be happy if this proposal is developed into a language change.
I'm a relative newbie here but also ran into this today. There are six structures in a package that share seven common fields, and are always constructed using composite literals. I attempted to refactor the common fields into a base structure... but that will make the composite literals ugly and expose the internal details of the structures which are part of the package's API.
Some of the discussion above talked about defining composite literals as effectively a shorthand for a series of assignments, and alluded to the consequences of that, but I was left a little unsure about how all of the different facts discussed above would combine and so here's a summary of what I found by experimentation. :grinning:
Let's use the following as a first example of the relatively-simple case this proposal was originally framed around:
type A struct {
Foo string
}
type B struct {
A
Bar string
}
func main() {
// The following is a hypothetical "desugaring" of B{Foo: "foo", Bar: "bar"}
// as a series of individual field assignments.
var thingy B
thingy.Foo = "foo"
thingy.Bar = "bar"
fmt.Println(&thingy)
}
So far so good. This works and achieves the desired effect of assigning "foo" to thingy.A.Foo and "bar" to thingy.Bar.
Embedding a pointer to A instead has a less favorable outcome:
type A struct {
Foo string
}
type B struct {
*A
Bar string
}
func main() {
// The following is a hypothetical "desugaring" of B{Foo: "foo", Bar: "bar"}
// as a series of individual field assignments.
var thingy B
thingy.Foo = "foo" // PANIC!
thingy.Bar = "bar"
fmt.Println(&thingy)
}
The thingy.Foo = "foo" assignment now panics at runtime because this is really assigning to thingy.A.Foo and thingy.A is nil, so we can't dereference it.
This situation is what the earlier comment giving three options was about. Specifying struct literals as just sugar for a series of assignments effectively chooses option 2: "panic at run time with a nil dereference error".
It's perhaps arguable that a composite literal is already more than just sugar over a series of assignments in that it won't let you redundantly assign to the same target twice:
thingy := A{
Foo: "foo1",
Foo: "foo2", // error: duplicate field name Foo in struct literal
}
...and so it does seem very defensible to me to pick either option 1 or option 3, which I think would reframe the "composite literals are just sugar for assignments" position a little to add some additional rules...
Composite literals are sugar for a series of assignments, with the following additional behaviors:
-
It's invalid to assign to the same location more than once, either by individual assignment or by assignment of an entire container that encloses that location.
-
The first assignment to a location behind a pointer that hasn't already been assigned implicitly allocates to establish the storage for that location. This counts as an assignment to the pointer field for the sake of the previous behavior, but not as an assignment of any of the locations within it. This special treatment makes the treatment of the content of the allocated object effectively the same as for a directly-embedded type aside from the hidden allocation.
(This effectively chooses Ian's option 1. It would also be possible to write a rule here that chooses Ian's option 3, which would avoid the need for this extra quirk about it counting as an assignment to the pointer field but not to the locations within by just forbidding that situation from arising in the first place. I personally lightly favor option 1 under a similar argument that composite literals exist to allow common situations to be expressed more concisely even at the expense of sometimes hiding some information, but I wouldn't be upset about option 3.)
-
When the overall result type is something dynamically-sized and therefore dynamically-allocated, the set of assigned locations is used as a hint for the size of the allocation, in a type-specific way. (i.e. the capacity of a slice, or the initial allocated size for a map)
I've used the imprecise term "location" above; I originally used "address" but loosened it because map elements are not addressable but I do intend them to be considered as "locations" for the sake of this rule.
Under that new definition, the "desugaring" would look something like this:
type A struct {
Foo string
}
type B struct {
*A
Bar string
}
func main() {
// The following is a hypothetical "desugaring" of B{Foo: "foo", Bar: "bar"}
// as a series of individual field assignments.
var thingy B
thingy.A = new(A) // implicit allocation and assignment to thingy.A
thingy.Foo = "foo" // assignment to thingy.Foo can now succeed
thingy.Bar = "bar"
fmt.Println(&thingy)
}
The following (with the same type definitions as the above example) would not be allowed due to violating the rule described in my first new behavior:
B{
Foo: "foo",
A: &A{}, // error: A was already implicitly assigned by the previous entry
}
B{
A: &A{},
Foo: "foo", // error: A.Foo was already implicitly assigned its zero value by the previous entry
}
The variation where B embeds A without pointer indirection is similar:
B{
Foo: "foo",
A: A{}, // error: A.Foo was already implicitly assigned "foo" by the previous entry
}
B{
A: A{},
Foo: "foo", // error: A.Foo was already implicitly assigned its zero value by the previous entry
}
Finally, a more interesting example with two fields in A:
type A struct {
Foo string
Bar string
}
type B struct {
A
}
B {
Foo: "foo",
Bar: "bar", // allowed because the previous entry only assigned to A.Foo, so A.Bar hasn't been mentioned yet
}
// desugars as:
var thingy B
thingy.Foo = "foo" // really: thingy.A.Foo
thingy.Bar = "bar" // really: thingy.A.Bar
B {
Foo: "foo",
A: A{ // error: A.Foo was already assigned "foo", so we can't reassign it the zero value here
Bar: "bar",
}
}
// if not rejected as an error, would hypothetically desugar as:
var thingy B
thingy.Foo = "foo" // really: thingy.A.Foo
thingy.A = {
// effectively includes Foo: ""
Bar: "bar",
}
B {
A: A{
Bar: "bar",
}
Foo: "foo", // error: A.Foo was already assigned its zero value by the previous element
}
// if not rejected as an error, would hypothetically desugar as:
var thingy B
thingy.A = {
// effectively includes Foo: ""
Bar: "bar",
}
thingy.Foo = "foo" // really: thingy.A.Foo
Overall my goal in all of this was to consider what exactly it might mean to treat a composite literal as a shorthand for a series of assignments. I'm sharing it here just in case it's helpful to others thinking about that question, or in case it raises some new questions worth discussing. :grinning: