Schema.NET WIP: Add Graph property support

WIP: Add Graph property support

Open Turnerj opened this issue 4 years ago • 6 comments

Getting the ball rolling for @graph support. Will close #132 when merged.

Adding the property is easy - working out where to put the logic about conditionally serializing @graph over other properties or how to deserialize it (as the deserialization target "should" be a JsonLdObject, not any type inheriting it) will be tricky. Perhaps can only deserialize a @graph if you choose JsonLdObject (eg. SchemaSerializer.DeserializeObject<JsonLdObject>(myJson)) though that seems a bit crappy.

I've marked this as minor currently as in theory, this should be backwards compatible. That said, happy to have this marked as major instead.

Jul 04 '20 07:07 Turnerj

This next part is kinda a ramble of thoughts about this all:

JsonLdObject seems like the best/only candidate because it is effectively type-less. As this example in the spec shows, the outer JSON references no type property with only a context. This should mean we don't serialize/deserialize a @graph if the type is WebPage.

One problem ends up being though is that people may expect IThing to be our root interface and from a schema.org standpoint it is. To support @graph though, people would need to consider JsonLdObject directly. Maybe we should create a IJsonLdObject interface and make IThing extend it? It all depends if we (or others) consider our interfaces as the type we reference or our class implementations? (Maybe another thing for @nickevansuk to weigh in on re. OpenActive.NET as this would impact all inherited types)

Let's think deserialization:

I have a blob of JSON which may be a real type or an object with a graph.
I can't know ahead of time which is which - I have to deserialize it to find out
If I attempt to deserialize to a WebPage and it is one, everything is fine.
If I attempt to deserialize to a WebPage but it is a @graph... -- Is that an exception? -- Do we just ignore every other property in deserialization and spit back a WebPage instance with the @graph filled in?

Now serialization:

I have an object of... -- JsonLdObject - so we should be fine serializing it either way -- WebPageObject - so if it has any values in @graph, we ignore absolutely every other property on WebPage?

Jul 04 '20 07:07 Turnerj

minor seems fine to me unless you count changing the serialization order as a breaking change.

We need to add some documentation to the README. It might be a good idea to also link to some documentation about @graph as it's definately a more advanced usage scenario.

I suspect we also need to protect against using @graph while also populating other properties. I wonder what the spec says about doing that. Perhaps we should create a new GraphObject type that has the extra @graph property and nothing else. Thoughts?

Nov 02 '20 10:11 RehanSaeed

Hey all, I was thinking of taking a stab, or trying to help, with this feature, and it looks like some maintainer-level decisions are still being worked through on how best to handle it. You all may be waiting on the System.Text.Json move as well, which would make sense.

In case some feedback from a users's perspective (typically a scraping use case without much known about the json upfront) is helpful:

The graphs I've seen in the wild nearly always contain an array of various types of objects, so a requirement to deserialize to a special graph object or even a List<JsonLdObject> doesn't feel crappy- it feels quite reasonable, given that @graph explicitly represents a potentially multi-type outcome.
If I were to try to force deserialize to a specific type, but it was a graph, I'd expect an exception. But to look at a similar scenario in the library: when someone tries to deserialize a regular object with a different type than what it is, the library simply returns an empty object of the requested type. That strikes me as a best effort type approach in gray area cases, which if applied to this would seem like the latter of @Turnerj's suggestions make the most sense: ignoring what isn't a WebPage

Happy to help once a direction is settled on. And thanks for your work on this library, it's nice.

Jan 17 '21 17:01 NickSpag

Thanks for commenting @NickSpag - I'm happy for you to take a shot at this issue.

The graphs I've seen in the wild nearly always contain an array of various types of objects, so a requirement to deserialize to a special graph object or even a List<JsonLdObject> doesn't feel crappy- it feels quite reasonable, given that @graph explicitly represents a potentially multi-type outcome.

So the reason I think it is a bit crappy is that you may not know that the JSON-LD is a graph or not when deserializing. If you called SchemaSerializer.DeserializeObject<JsonLdObject>(myJson) and your JSON isn't a graph, you are missing a lot of data. If you called SchemaSerializer.DeserializeObject<WebPage>(myJson) and it is a graph, it could throw an exception.

We may need to have TryDeserializeObject-type methods but that is likely a whole different discussion.

I suspect we also need to protect against using @graph while also populating other properties. I wonder what the spec says about doing that. Perhaps we should create a new GraphObject type that has the extra @graph property and nothing else. Thoughts?

I think if we can prevent @graph being able to be set via anything other than JsonLdObject directly, that may be a good idea. That way we can kinda protect people from themselves. Having a GraphObject does make that wildly easier for us as nothing would inherit it but I wonder how obvious it would be to users.

Jan 18 '21 04:01 Turnerj

So the reason I think it is a bit crappy is that you may not know that the JSON-LD is a graph or not when deserializing.

Ah, okay I see your thinking there- I had mentally separated the two user paths in to known graph or known object.

One question to help my understanding, are there other scenarios in the library where an array of multiple different types is expected? If so, how are they handled? I know there's the OneOrMany<T> for values, etc, but that might not be the right tool for this particular job- I'm a little mentally stuck on what Type someone expecting a graph array of multiple different-typed objects should provide to the deserialize methods to preserve as much data as possible.

But to expand on your first attempt @Turnerj at mapping scenarios, it looks like there are:

Deserialize

Graph containing objects of one type => same object type given (plus array considerations)
- Straightforward
Graph containing objects of one type => a different object type given
- Return exception/null/empty object?
Graph containing objects of multiple types => one type given that isn't found in the contents of the graph
- Return exception/null/empty object?
Graph containing objects of multiple types => one type found in the contents of the graph
- Ignore the other objects that aren't of that type, losing that data?
Graph contains objects of multiple types => ?? to my earlier question, what type should the user provide here to indicate an array of JsonLdObjects or something else thats schema typeless
- Return array of the provided type to let users iterate through as they need?

Serialize

This entire direction seems more straightforward since we already know the types. Given the serialization path goes through .ToString() and .ToHtmlEscapedString(), does it make sense to simply have a .ToGraphString() and .ToGraphHtmlEscapedString() where the objects are put in a graph property? It allows this advanced use case to be opt-in and non-default, to keep things simple for users.

Approaches

Could include one, or a combo of, the following:

Adding a non-virtual @graph property to JsonLdObject
Adding a virtual @graph property to JsonLdObject
- This is referring to your second outcome in the deserialize a WebPage scenario: spit back a WebPage object with the graph filled in. It also might make the serialization story a bit more complicated though, as you highlighted.
A GraphObject extending JsonLdObject with a non-virtual @graph property
- If this approach was used we wouldn't need the separate serialization methods

With an open question relating to the above: do any of those approaches also need an interface change incorporated, per your comments about the users's concept of a root interface/abstraction

How's that sound so far? Thoughts?

Jan 24 '21 20:01 NickSpag

I've re-written this comment multiple times before posting it as throughout writing it, I realised various different things that made my previously written thoughts obsolete or incomplete. What you see below is what survived the cutting room floor of my mind...

One question to help my understanding, are there other scenarios in the library where an array of multiple different types is expected? If so, how are they handled? I know there's the OneOrMany<T> for values, etc, but that might not be the right tool for this particular job- I'm a little mentally stuck on what Type someone expecting a graph array of multiple different-typed objects should provide to the deserialize methods to preserve as much data as possible.

It really depends how we think about how we deserialize. We could use OneOrMany<T> (like I do in this PR) with T being either IThing or JsonLdObject (I'm not sure which is more appropriate). We don't need to specify a more specific T as we can rely on inheritance from these base types.

I wrote a whole big thing how I think the main issue is about method signatures, specifically DeserializeObject and while I don't think it really answers any part of your comment specifically, one bit that might be useful is a thought I had about a new type, likely extending from JsonLdObject.

Perhaps we create a new JsonLdGraph object that has a single property for the graph. In this form, it could use OneOrMany<IThing> as it then acts as a catch-all for types with the user only needing to cast to their expected types. Not great but not terrible.

We could additionally have a JsonLdGraph<T> and feed the generic type to our OneOrMany property, potentially making it slightly easier for users. This could be extended to JsonLdGraph<T1, T2, etc> and then instead of using OneOrMany<T> use Values<T1, T2, etc>. This brings more explicitly type safety/usage and I believe the deserialization at the moment silently hides data for Values etc where the types can't be used.

So let's assume JsonLdGraph and any number of generics to answer your specific deserialization points:

Graph containing objects of one type => same object type given (plus array considerations)

✅ If the JSON graph was of a WebPage, we could deserialize to JsonLdGraph<WebPage> and would automatically handle arrays.

Graph containing objects of one type => a different object type given

❓ If the JSON graph was of a WebPage, I believe it could technically deserialize to JsonLdGraph<WebPage> however there would be no items. This would be how the deserialization system currently works BUT would need to double check.

One thing we could do is have the Values property on JsonLdGraph start with IThing (or JsonLdObject) as the first entry of the generics (eg. Values<IThing, WebPage, Book, Movie>) then we could support both specific types and non-specific ones.

Graph containing objects of multiple types => one type given that isn't found in the contents of the graph

✅ If the JSON graph was of a WebPage, we could deserialize to JsonLdGraph<WebPage, Book> and everything would work fine. A Values property type can have zero, one or many of any of the types at once.

Graph containing objects of multiple types => one type found in the contents of the graph

❓ Similar to the second point, if the JSON graph was of a WebPage and Movie, I believe it could technically deserialize to JsonLdGraph<WebPage, Book> but only the WebPage would deserialize. The same suggestion about having the Values property use IThing or something as the first generic parameter could act as a catch all.

Graph contains objects of multiple types => ?? to my earlier question, what type should the user provide here to indicate an array of JsonLdObjects or something else thats schema typeless

✅ In this scenario with JsonLdGraph, the user might not need to provide anything. They could just have JsonLdGraph without generics backed by OneOrMany<IThing> or something.

This entire direction seems more straightforward since we already know the types. Given the serialization path goes through .ToString() and .ToHtmlEscapedString(), does it make sense to simply have a .ToGraphString() and .ToGraphHtmlEscapedString() where the objects are put in a graph property? It allows this advanced use case to be opt-in and non-default, to keep things simple for users.

Yeah, serialization is wildly easier no matter how we go about it. In this scenario with a JsonLdObject, we wouldn't need to do anything specific. As an object with our OneOrMany or Values property, everything will serialize correctly automatically - we shouldn't even need a .ToGraphString() etc.

A GraphObject extending JsonLdObject with a non-virtual @graph property If this approach was used we wouldn't need the separate serialization methods

Hey - you had the same thought as me - it really pays for me to keep re-reading as I keep re-writing.

With an open question relating to the above: do any of those approaches also need an interface change incorporated, per your comments about the users's concept of a root interface/abstraction

Now that I've thought through the idea of a JsonLdGraph a bunch writing this, the more it seems like the best idea. All our objects would inherit from JsonLdObject but not from JsonLdGraph because our derived types can't themselves be graphs.

Whether we go super fancy and have generics for JsonLdGraph<T1, T2, etc> is a more minor implementation/user friendliness detail. Having a JsonLdGraph with a OneOrMany<T> would capture our requirements for this with only being a more mild inconvenience to users.

In terms of specific interfaces needing changing, I don't think they would need to. If we were to do one of the other approaches with @graph being a property via JsonLdObject, things do get a lot more tricky and may require something like that.

Hopefully I've kinda answered/addressed your points!

Jan 25 '21 07:01 Turnerj

Schema.NET Schema.NET copied to clipboard

WIP: Add Graph property support

Deserialize

Serialize

Approaches

Schema.NET
Schema.NET copied to clipboard