go-ipld-prime Improving clarity and error handling around absent values

As of v0.14ish (and considerably before), the exact contract that the datamodel.Node interface has for values that are not found in a map (or are out of bounds index in a list) is not super well documented, nor is a defacto convention well adhered-to in implementations. This needs to be rectified. The current situation makes writing correct code for handling data generically very difficult, if absence of values needs to be handled distinctly from other potential error paths.

(Why is this an issue now, and not earlier? It's always been an issue -- but sometimes it's avoidable. The typed Node implementations have generally had clearer behavior for this. In many cases, a data transformation that's not conditional on the existing data hits no speedbumps here. But in completely generic data transforms -- such as a Patch implementation, which I'm working on now, it's becoming quite keening indeed.)

At the same time: I'm finding that I experience 'nil' considerably more often than I'm happy with while using this library. One of the biggest sources of this is when nil is used where an Absent value might also be semantically reasonable.

To address both of these problems, here is what I think we should do:

Node implementations should return Absent for the value when doing lookups in a map or list, if that value is indeed absent. (No longer should this ever use 'nil'.)
Nodes should not return datamodel.ErrNotExists anymore. Returning Absent as the value and nil as an error should be the new style. (Trying to handle value absence as an error path has turned out to be considerably painful, and often produces confusing branching.)
ErrNotFound: this might still exist, but it's should only used by APIs that are doing more than one stride at a time (e.g. traversal stuff).
While we're at it: In typed nodes, structs also should return Absent for both a field which is defined but was absent, and also for a field which is not defined. (Previously, structs returned somewhat stronger error for a field that is not defined: schema.ErrNoSuchField. But I've come to think this is probably on net unhelpful; the general learning from working with our type system over time seems be uncovering that the less the type lens makes things diverge from the basic data model, the better.)

This is likely to be a breaking change in a few cases. Worse, it's the kind that the compiler won't detect. Nonetheless, I think pulling the bandaid on this one will not become easier over time, so it might as well be done soon or now.

Previous review: https://github.com/ipld/go-ipld-prime/issues/74 (no final resolution reached, but covers the same problem). Possibly also relevant (though tangentally): https://github.com/ipld/go-ipld-prime/issues/191

We might consider how to migrate this smoothly. ~For example, perhaps there could be an intermediate time where we return both Absent and ErrNotExists. However, it's not immediately clear that this would actually result in greater smoothness of transition.~ AMEND: Let's not atttempt this particular "migration" strategy. It's sufficiently un-idiomatic golang that it seems like a very poor idea, even temporarily.

Feb 14 '22 15:02 warpfork

I experience 'nil' considerably more often than I'm happy with

Is this just a smell thing? Or are there practical reasons why this is a problem?

In typed nodes, structs also should return Absent for both a field which is defined but was absent, and also for a field which is not defined

This seems like the most controversial part of this to me; it seems like you're proposing a loss of information for typed nodes. There'll be no easy way to distinguish between "this type has no such field" and "this field's value is currently absent", will there? I assume this is to make it so typed nodes are more easily interchangeable with basic nodes and don't have these weird behaviours. But is this a case of removing some of the utility of typed nodes in the process of making them generic?

Feb 14 '22 23:02 rvagg

I basically arrived at the same thoughts that Rod brought up. I also have a worry that the transition will be painful, but given that ipld-prime is a v0, I imagine we can get away with it.

As for "not found" returning the current nil, ErrNotExists versus a future Absent, nil: I have a slight preference for the current form, because it's more difficult to misuse. For instance:

// compiler error! err declared and not used
node, err := parent.LookupByString("foo")
use(node)

Or:

// suspicious; explicitly ignored error
node, _ := parent.LookupByString("foo")
use(node)

Users today will instead write something like:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle any error, but you can also handle ErrNotExists separately here
}
use(node)

However, this will not be enough with the new Absent, nil form, and none of the static analysis or syntax cues will help you. You'll instead want something like:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle other errors
}
if node.IsAbsent() {
    // handle the absent case
}
use(node)

Feb 16 '22 13:02 mvdan

I should note that returning a nil error for "not found" can be very reasonable if such a case is entirely normal, and using the result normally won't cause problems. For instance, if the return parameters are (*SomeStruct, error), and the methods on *SomeStruct don't panic when called on nil. That's not our case here, because we return an interface, which will panic when methods are called on its nil value.

Feb 16 '22 14:02 mvdan

I think the main distinction is, while with absent values, one may write:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle other errors
}
if node.IsAbsent() {
    // handle the absent case
}
use(node)

... one may also simply not have that IsAbsent branch sometimes: it's a fairly normal value and you can often pass it on.

Whereas currently, one generally must write:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle other errors
}
if node == nil {
    // handle the absent case
}
use(node)

... because passing forward a nil Node generally doesn't produce sensible results, and is very prone to producing nil panics (which are then in turn a PITA to debug because they're very uninformative).

Feb 28 '22 06:02 warpfork

Returning a valid value and an err response doesn't seem very idiomatic Go to me, is this something that's regularly done? Are there common APIs that employ this?

Feb 28 '22 06:02 rvagg

I agree, returning both a valid value and an error response is not very idiomatic. I'll strike that suggestion from the original post.

Feb 28 '22 11:02 warpfork

This seems like the most controversial part of this to me; it seems like you're proposing a loss of information for typed nodes. [...] I assume this is to make it so typed nodes are more easily interchangeable with basic nodes and don't have these weird behaviours. But is this a case of removing some of the utility of typed nodes in the process of making them generic?

Yes and yes.

Agree that this is controversial. I'm not entirely convinced of it myself. But the reasoning is as you say: I'd like the schema system to be even less special than it is. (It's already minimally special! These bits about absent fields are one of the remaining noticeable specialnesses.)

We could crowbar this choice out of this issue (and probably should). (It should probable affect the decision about how "length" works on structs with optional fields too, and that'd be kind of a bigger change.)

There'll be no easy way to distinguish between "this type has no such field" and "this field's value is currently absent", will there?

Right. Not through the data model universal interface.

One would need to shift gears into some kind of introspective mode that's outside the data model.

That doesn't currently exist in a very standardized way. (It's present in the golang code! But if we wanted to make a declarative/message-passing serial API about it, and put serial fixtures about its behavior in the specs/meta repo, we'd have to invent a lot of things that aren't invented yet.) But we could make that. And it might be useful and reasonable to do so.

Meanwhile: I think it's rather questionable whether being able to make that distinction through the data model view is actually useful in practice. I thought it would be! But I don't know if I can think of any supporting evidence or major use of it, now that we've had that ability for a while. (More data welcome.)

Feb 28 '22 11:02 warpfork

I think it's rather questionable whether being able to make that distinction through the data model view is actually useful in practice. I thought it would be! But I don't know if I can think of any supporting evidence or major use of it, now that we've had that ability for a while.

True .. I can't think of a good case or even a good hypothetical .. maybe with time.

Mar 01 '22 02:03 rvagg

go-ipld-prime go-ipld-prime copied to clipboard

Improving clarity and error handling around absent values

go-ipld-prime
go-ipld-prime copied to clipboard