go-ipld-prime Should ErrWrongKind contain type info? If so: can we make this implementable with less boilerplate?

Should ErrWrongKind contain type info? If so: can we make this implementable with less boilerplate?

Open warpfork opened this issue 4 years ago • 1 comments

This issue is for a little design discussion about what kind of information we expect from wrong-kind errors -- the errors that one gets when calling a function on a Node that clearly can't answer it sanely (e.g., the thing that happens when calling LookupByString on a value that's a list, where that operation is undefined).

Currently, a lot of code I've written opts for very high amounts of information -- stating at least the actual vs appropriate kinds, but also including a type name if there's schemas in play. I figure this information is probably useful in error messages and in development & debugging.

To get more precise, the current type definition is:

type ErrWrongKind struct {
	TypeName string
	MethodName string
	AppropriateKind ReprKindSet
	ActualKind ReprKind
}

This gets initialized in practice with something like:

ipld.ErrWrongKind{TypeName: "getthis.HereSomehow", MethodName: "LookupByIndex", AppropriateKind: ipld.ReprKindSet_JustList, ActualKind: ipld.ReprKind_String}

But some of this information has a downside: in particular, the type names: the amount of extra code needed to deal with getting those type names into the error text is pretty significant. I can't really figure out a good way to get the correct text in there without a lot of fuss and an uncomfortable amount of boilerplate. This affects us in a couple of ways: codegen output sizes rise sharply (I'll provide an example in a moment), and it also produces a lot of irritation if a developer wants to make a new Node implementation by hand (and this does come up: fairly often if someone wants to implement an ADL, for example).

So how bad is this?

Well,let's look at that example of codegen output I promised earlier. Here it is, in all its horror and glory: the methods that have to be generated to satisfy the ipld.Node interface if you make a simple type Strang string declaration and want to generate that:

type _Strang struct{ value string }

var _ ipld.Node = &_Strang{}

func (*_Strang) ReprKind() ipld.ReprKind {
	return ipld.ReprKind_String
}
func (*_Strang) LookupByString(string) (ipld.Node, error) {
	return mixins.String{"somepkg.Strang"}.LookupByString("")
}
func (*_Strang) LookupByNode(ipld.Node) (ipld.Node, error) {
	return mixins.String{"somepkg.Strang"}.LookupByNode(nil)
}
func (*_Strang) LookupByIndex(idx int) (ipld.Node, error) {
	return mixins.String{"somepkg.Strang"}.LookupByIndex(0)
}
func (*_Strang) LookupBySegment(seg ipld.PathSegment) (ipld.Node, error) {
	return mixins.String{"somepkg.Strang"}.LookupBySegment(seg)
}
func (*_Strang) MapIterator() ipld.MapIterator {
	return nil
}
func (*_Strang) ListIterator() ipld.ListIterator {
	return nil
}
func (*_Strang) Length() int {
	return -1
}
func (*_Strang) IsAbsent() bool {
	return false
}
func (*_Strang) IsNull() bool {
	return false
}
func (*_Strang) AsBool() (bool, error) {
	return mixins.String{"somepkg.Strang"}.AsBool()
}
func (*_Strang) AsInt() (int, error) {
	return mixins.String{"somepkg.Strang"}.AsInt()
}
func (*_Strang) AsFloat() (float64, error) {
	return mixins.String{"somepkg.Strang"}.AsFloat()
}
func (n *_Strang) AsString() (string, error) {
	return n.value, nil
}
func (*_Strang) AsBytes() ([]byte, error) {
	return mixins.String{"somepkg.Strang"}.AsBytes()
}
func (*_Strang) AsLink() (ipld.Link, error) {
	return mixins.String{"somepkg.Strang"}.AsLink()
}
func (*_Strang) Prototype() ipld.NodePrototype {
	return _Strang__Prototype{}
}

Basically all of these methods are producing ErrWrongKind rejections -- all the mixin.String{"..."}.DelegatedCall stuff is a holler over to the node/mixins package, where they get translated into ErrWrongKind with all the other arguments plugged in already (this being possible since they're all simply properties determined by Kind). Since this type is only supporting data of one Kind, there's approximately one method doing useful work out of this whole set.

And there's about 50 lines there that aren't very interesting. That's enough that it makes me a bit twitchy.

All of these generated methods increase the GSLOC (the Generated Source Lines of Code count, if you will) toll of our codegen pretty drastically -- because you pay this toll again for every type in your schema (twice, even; once for the type-level Node, and a second time for the representation-mode Node implementation). They also increase the compiled binary output size. And if you were authoring a new ipld.Node implementation by hand (say, for an ADL, or some other purpose), you'd have to write all these... by hand.

Wouldn't it be nice if we could shorten this up drastically?

So let's try making a type that contains the boilerplate methods, so we can embed it.

Something like:

type _Strang struct {
	mixins.String // provides all the boilerplate functions
	value string // the actual value
}
// surely we'd only be left needing to implement AsString and Prototype!

But... no, this does not work well. (The docs in the code of the mixins package we have already hint at why.)

One ends up needing to carry the type info string in that mixin type. It turns out to be type mixins.String { TypeName string } if it's going to be carrying that type name info into the errors produced by the boilerplate methods, right?

And then since that type isn't zero-size, this embed would mean every one of our values would get bigger. Not good. (Remember, in the bigger picture, the user should be able to have a schema saying type Strang string; type Foobar {String:Strang} and expect not much different memory overhead when filling data into the resulting generated code than they would've if rolling it themselves by hand.)

And for more fun, it would require constructors to assign the TypeName info... for every Strang value ever produced.

Oof. Yeah, this does not seem to go well.

How about something like:

type _Strang struct {
	mixins.String{"Strang"} // if we could assign a const here, that'd seem to do the trick!
	value string
}

No, this is not valid golang. Alas.

How about doing something with runtime inspection of the call stack? Could we have the boilerplate method in the mixin type peek somehow for the golang typeinfo of the thing its been embedded into and called via, and then we could do some munging on that to get the info we want for our error message?

As far as I can tell... no. (If someone can find a clever way to do this, that'd be heckin' neat, and I'll be very happy to be wrong.)

I poked at this a bit. The info is even there, in the compiler's hands, during at least some of the phases of assembly processing. (The Strang type in our example ends up with autogenerated methods which are stubs that call the methods on the embedded type... so, there does appear to be a body of assembly that -- just for a moment -- could be said to have the required info in hand.) But this doesn't seem to be accessible to the code that we write. At least not in any way I could figure out. (It's definitely not legible via runtime.Callers.)

How about something with macros? This is what people in other languages use macros for, right?

Well. Golang. So: no -- we don't really have macros in our toolbelt.

What we're doing in the codegen example pretty much is "macros", so to speak. So maybe this is a hint that what we're doing in codegen is actually fine after all. It's unfortunate that it bloats GSLOC so much, but maybe it's fine. (Macros would do roughly the same thing, just make it happen farther outside of where a human would normally see it.)

What works acceptably well in the codegen story doesn't necessarily apply to the other story though. A human who's authoring an ADL and needs to make a Node implementation as part of that: they're kinda still stuck with boilerplating a whole bunch.

Maybe we could make a really small template for that just generates the boilerplate methods based on arguments for the kind, the golang type name, and the type info string to use in the error messages. That could help. All of its outputs could go in a separate file than the stuff that the human writes, which means the maintainability of it is pretty viable.

I'd still prefer a solution that doesn't involve invoking a codegen phase over one that does, though.

Okay. So, seeing all the trouble this is... Maybe we should revisit a more basic question: what if we could just... not? Do we need ErrWrongKind to contain type information about the Node it arose from at all?

I still tend to think it has some value. It's a lot more pleasant to tell someone, e.g.: "ErrWrongKind: a Frobnoz node (which is a string kind) does not support the LookupByString function (which is only applicable to nodes of map kind)"... than it is to issue that whole error without the additional "Frobnoz" information, if you have it.

It is trouble though.

So I'm not sure what the end result of this is. It's possible this issue is just publicly documenting the search I did so far, and ends as a quiet lament, and we just move on. Maybe generating the boilerplate methods with the relevant strings is just the best we can do. Or maybe we decide that kind errors aren't worth this effort at all, which would let us drop a ton of this stuff.

At any rate, the reader can now see the examples above which demonstrate just how much boilerplate this currently involves.

Thoughts welcome.

Sep 02 '20 11:09 warpfork

(There might also be a discussion worth having about whether treating TypeName as a string is a sane design for this information -- but it can be had somewhat separately, because most of the issue discussed here is simply about how to flow the information through the program; it doesn't really matter if TypeName is a string or a complex structure.)

Sep 02 '20 12:09 warpfork

go-ipld-prime go-ipld-prime copied to clipboard

Should ErrWrongKind contain type info? If so: can we make this implementable with less boilerplate?

go-ipld-prime
go-ipld-prime copied to clipboard