Schema.NET
Schema.NET copied to clipboard
WIP: Add Graph property support
Getting the ball rolling for @graph
support. Will close #132 when merged.
Adding the property is easy - working out where to put the logic about conditionally serializing @graph
over other properties or how to deserialize it (as the deserialization target "should" be a JsonLdObject
, not any type inheriting it) will be tricky. Perhaps can only deserialize a @graph
if you choose JsonLdObject
(eg. SchemaSerializer.DeserializeObject<JsonLdObject>(myJson)
) though that seems a bit crappy.
I've marked this as minor
currently as in theory, this should be backwards compatible. That said, happy to have this marked as major
instead.
This next part is kinda a ramble of thoughts about this all:
JsonLdObject
seems like the best/only candidate because it is effectively type-less. As this example in the spec shows, the outer JSON references no type property with only a context. This should mean we don't serialize/deserialize a @graph
if the type is WebPage
.
One problem ends up being though is that people may expect IThing
to be our root interface and from a schema.org standpoint it is. To support @graph
though, people would need to consider JsonLdObject
directly. Maybe we should create a IJsonLdObject
interface and make IThing
extend it? It all depends if we (or others) consider our interfaces as the type we reference or our class implementations? (Maybe another thing for @nickevansuk to weigh in on re. OpenActive.NET as this would impact all inherited types)
Let's think deserialization:
- I have a blob of JSON which may be a real type or an object with a graph.
- I can't know ahead of time which is which - I have to deserialize it to find out
- If I attempt to deserialize to a
WebPage
and it is one, everything is fine. - If I attempt to deserialize to a
WebPage
but it is a@graph
... -- Is that an exception? -- Do we just ignore every other property in deserialization and spit back aWebPage
instance with the@graph
filled in?
Now serialization:
- I have an object of...
--
JsonLdObject
- so we should be fine serializing it either way --WebPageObject
- so if it has any values in@graph
, we ignore absolutely every other property onWebPage
?
minor
seems fine to me unless you count changing the serialization order as a breaking change.
We need to add some documentation to the README. It might be a good idea to also link to some documentation about @graph
as it's definately a more advanced usage scenario.
I suspect we also need to protect against using @graph
while also populating other properties. I wonder what the spec says about doing that. Perhaps we should create a new GraphObject
type that has the extra @graph
property and nothing else. Thoughts?
Hey all, I was thinking of taking a stab, or trying to help, with this feature, and it looks like some maintainer-level decisions are still being worked through on how best to handle it. You all may be waiting on the System.Text.Json move as well, which would make sense.
In case some feedback from a users's perspective (typically a scraping use case without much known about the json upfront) is helpful:
- The graphs I've seen in the wild nearly always contain an array of various types of objects, so a requirement to deserialize to a special graph object or even a
List<JsonLdObject>
doesn't feel crappy- it feels quite reasonable, given that@graph
explicitly represents a potentially multi-type outcome. - If I were to try to force deserialize to a specific type, but it was a graph, I'd expect an exception. But to look at a similar scenario in the library: when someone tries to deserialize a regular object with a different type than what it is, the library simply returns an empty object of the requested type. That strikes me as a best effort type approach in gray area cases, which if applied to this would seem like the latter of @Turnerj's suggestions make the most sense: ignoring what isn't a
WebPage
Happy to help once a direction is settled on. And thanks for your work on this library, it's nice.
Thanks for commenting @NickSpag - I'm happy for you to take a shot at this issue.
The graphs I've seen in the wild nearly always contain an array of various types of objects, so a requirement to deserialize to a special graph object or even a List<JsonLdObject> doesn't feel crappy- it feels quite reasonable, given that @graph explicitly represents a potentially multi-type outcome.
So the reason I think it is a bit crappy is that you may not know that the JSON-LD is a graph or not when deserializing. If you called SchemaSerializer.DeserializeObject<JsonLdObject>(myJson)
and your JSON isn't a graph, you are missing a lot of data. If you called SchemaSerializer.DeserializeObject<WebPage>(myJson)
and it is a graph, it could throw an exception.
We may need to have TryDeserializeObject
-type methods but that is likely a whole different discussion.
I suspect we also need to protect against using
@graph
while also populating other properties. I wonder what the spec says about doing that. Perhaps we should create a new GraphObject type that has the extra@graph
property and nothing else. Thoughts?
I think if we can prevent @graph
being able to be set via anything other than JsonLdObject
directly, that may be a good idea. That way we can kinda protect people from themselves. Having a GraphObject
does make that wildly easier for us as nothing would inherit it but I wonder how obvious it would be to users.
So the reason I think it is a bit crappy is that you may not know that the JSON-LD is a graph or not when deserializing.
Ah, okay I see your thinking there- I had mentally separated the two user paths in to known graph or known object.
One question to help my understanding, are there other scenarios in the library where an array of multiple different types is expected? If so, how are they handled? I know there's the OneOrMany<T>
for values, etc, but that might not be the right tool for this particular job- I'm a little mentally stuck on what Type someone expecting a graph array of multiple different-typed objects should provide to the deserialize methods to preserve as much data as possible.
But to expand on your first attempt @Turnerj at mapping scenarios, it looks like there are:
Deserialize
- Graph containing objects of one type => same object type given (plus array considerations)
- Straightforward
- Graph containing objects of one type => a different object type given
- Return exception/null/empty object?
- Graph containing objects of multiple types => one type given that isn't found in the contents of the graph
- Return exception/null/empty object?
- Graph containing objects of multiple types => one type found in the contents of the graph
- Ignore the other objects that aren't of that type, losing that data?
- Graph contains objects of multiple types => ?? to my earlier question, what type should the user provide here to indicate an array of JsonLdObjects or something else thats schema typeless
- Return array of the provided type to let users iterate through as they need?
Serialize
This entire direction seems more straightforward since we already know the types. Given the serialization path goes through .ToString()
and .ToHtmlEscapedString()
, does it make sense to simply have a .ToGraphString()
and .ToGraphHtmlEscapedString()
where the objects are put in a graph property? It allows this advanced use case to be opt-in and non-default, to keep things simple for users.
Approaches
Could include one, or a combo of, the following:
- Adding a non-virtual
@graph
property to JsonLdObject - Adding a virtual
@graph
property to JsonLdObject- This is referring to your second outcome in the deserialize a
WebPage
scenario: spit back a WebPage object with the graph filled in. It also might make the serialization story a bit more complicated though, as you highlighted.
- This is referring to your second outcome in the deserialize a
- A GraphObject extending JsonLdObject with a non-virtual
@graph
property- If this approach was used we wouldn't need the separate serialization methods
With an open question relating to the above: do any of those approaches also need an interface change incorporated, per your comments about the users's concept of a root interface/abstraction
How's that sound so far? Thoughts?
I've re-written this comment multiple times before posting it as throughout writing it, I realised various different things that made my previously written thoughts obsolete or incomplete. What you see below is what survived the cutting room floor of my mind...
One question to help my understanding, are there other scenarios in the library where an array of multiple different types is expected? If so, how are they handled? I know there's the
OneOrMany<T>
for values, etc, but that might not be the right tool for this particular job- I'm a little mentally stuck on what Type someone expecting a graph array of multiple different-typed objects should provide to the deserialize methods to preserve as much data as possible.
It really depends how we think about how we deserialize. We could use OneOrMany<T>
(like I do in this PR) with T
being either IThing
or JsonLdObject
(I'm not sure which is more appropriate). We don't need to specify a more specific T
as we can rely on inheritance from these base types.
I wrote a whole big thing how I think the main issue is about method signatures, specifically DeserializeObject
and while I don't think it really answers any part of your comment specifically, one bit that might be useful is a thought I had about a new type, likely extending from JsonLdObject
.
Perhaps we create a new JsonLdGraph
object that has a single property for the graph. In this form, it could use OneOrMany<IThing>
as it then acts as a catch-all for types with the user only needing to cast to their expected types. Not great but not terrible.
We could additionally have a JsonLdGraph<T>
and feed the generic type to our OneOrMany
property, potentially making it slightly easier for users. This could be extended to JsonLdGraph<T1, T2, etc>
and then instead of using OneOrMany<T>
use Values<T1, T2, etc>
. This brings more explicitly type safety/usage and I believe the deserialization at the moment silently hides data for Values
etc where the types can't be used.
So let's assume JsonLdGraph
and any number of generics to answer your specific deserialization points:
Graph containing objects of one type => same object type given (plus array considerations)
✅ If the JSON graph was of a WebPage
, we could deserialize to JsonLdGraph<WebPage>
and would automatically handle arrays.
Graph containing objects of one type => a different object type given
❓ If the JSON graph was of a WebPage
, I believe it could technically deserialize to JsonLdGraph<WebPage>
however there would be no items. This would be how the deserialization system currently works BUT would need to double check.
One thing we could do is have the Values
property on JsonLdGraph
start with IThing
(or JsonLdObject
) as the first entry of the generics (eg. Values<IThing, WebPage, Book, Movie>
) then we could support both specific types and non-specific ones.
Graph containing objects of multiple types => one type given that isn't found in the contents of the graph
✅ If the JSON graph was of a WebPage
, we could deserialize to JsonLdGraph<WebPage, Book>
and everything would work fine. A Values
property type can have zero, one or many of any of the types at once.
Graph containing objects of multiple types => one type found in the contents of the graph
❓ Similar to the second point, if the JSON graph was of a WebPage
and Movie
, I believe it could technically deserialize to JsonLdGraph<WebPage, Book>
but only the WebPage
would deserialize. The same suggestion about having the Values
property use IThing
or something as the first generic parameter could act as a catch all.
Graph contains objects of multiple types => ?? to my earlier question, what type should the user provide here to indicate an array of JsonLdObjects or something else thats schema typeless
✅ In this scenario with JsonLdGraph
, the user might not need to provide anything. They could just have JsonLdGraph
without generics backed by OneOrMany<IThing>
or something.
This entire direction seems more straightforward since we already know the types. Given the serialization path goes through
.ToString()
and.ToHtmlEscapedString()
, does it make sense to simply have a.ToGraphString()
and.ToGraphHtmlEscapedString()
where the objects are put in a graph property? It allows this advanced use case to be opt-in and non-default, to keep things simple for users.
Yeah, serialization is wildly easier no matter how we go about it. In this scenario with a JsonLdObject
, we wouldn't need to do anything specific. As an object with our OneOrMany
or Values
property, everything will serialize correctly automatically - we shouldn't even need a .ToGraphString()
etc.
A GraphObject extending JsonLdObject with a non-virtual @graph property If this approach was used we wouldn't need the separate serialization methods
Hey - you had the same thought as me - it really pays for me to keep re-reading as I keep re-writing.
With an open question relating to the above: do any of those approaches also need an interface change incorporated, per your comments about the users's concept of a root interface/abstraction
Now that I've thought through the idea of a JsonLdGraph
a bunch writing this, the more it seems like the best idea. All our objects would inherit from JsonLdObject
but not from JsonLdGraph
because our derived types can't themselves be graphs.
Whether we go super fancy and have generics for JsonLdGraph<T1, T2, etc>
is a more minor implementation/user friendliness detail. Having a JsonLdGraph
with a OneOrMany<T>
would capture our requirements for this with only being a more mild inconvenience to users.
In terms of specific interfaces needing changing, I don't think they would need to. If we were to do one of the other approaches with @graph
being a property via JsonLdObject
, things do get a lot more tricky and may require something like that.
Hopefully I've kinda answered/addressed your points!