jackson-dataformats-text icon indicating copy to clipboard operation
jackson-dataformats-text copied to clipboard

Try adding support for `JsonNode` and full resolution of Object Ids, without need for `@JsonIdentityInfo`

Open cowtowncoder opened this issue 6 years ago • 16 comments

(note: lots of background, and follow-up for #98)

So: handling of YAML Anchors, References, is problematic since: JSON has no directly equivalent notion. And although similar logical concept has been added by Jackson -- Object Id, via @JsonIdentityInfo -- it is both bit awkward to use and limited in scope (i.e. not handling all cases).

But it might be possible to implement full, no-annotation style resolution just for "Tree Model" (that is, JsonNode). Since there is already

JsonParser.canReadObjectId()

to indicate if format is capable of expressing Object Ids natively (returning true for YAML), it might be possible to combine this with another setting that essentially indicates that automatic resolution should be used. I am not 100% sure if this is indeed doable, especially with 2.x, but I think it is worth exploring. And with Jackson 3.0 it should be even more likely to be doable.

So. If and when I have time to investigate this, I'd like to see if this can be made.

The obvious corollary, then, would be that much of YAML processing would first use readTree() (and equivalents) to resolve all references, and then either use that or use mapper.treeToValue() to get to actual POJOs, ones that need not handle anchors/refs at all.

cowtowncoder avatar Feb 14 '19 18:02 cowtowncoder

Interesting (I'm still trying to process all the info from #120 and #121). A JSON array and a JSON object are both JsonNode, correct?

For reference, here is a (shortened) version of a YAML file i'm trying to parse:

vertices:
  - &v1
      name: V1
      description: First vertex
  - &v2
      name: V2
      description: Second vertex
edges:
  - [*v1, *v2]

nkiesel avatar Feb 14 '19 19:02 nkiesel

Should this not also get the 3.x label?

nkiesel avatar Feb 14 '19 19:02 nkiesel

@nkiesel wrt 3.x probably, but I have not yet made up my mind that it is impossible to do with 2.10.

And as to array, object, yes, JsonNode is the base type of ArrayNode, ObjectNode. YAML can be mapped quite easily to those, although there is no place to store Object Ids or Type Ids. Come to think of that, this might be problematic wrt latter.

I still can't read YAML fluently but would this be similar to what you'd expect to see in tree model (equivalent to this json)?

{
  "vertices" : [{
      { "name" : "V1", .... },
      { "name" : "V2", .... }
   }]
  "edges" : [ { ... v1 properties... } , { ... v2 properties... } ]
}

This also underscores one obvious challenge I think I mentioned in the other issue: handling of cyclic dependencies... JsonNode does not handle those well. Or, rather, callers might not. But on plus side, object model itself theoretically allows them; and actual values would not need to be copied either.

cowtowncoder avatar Feb 14 '19 23:02 cowtowncoder

Your JSON is close, you just have one extra pair of {} inside the vertices array and a missing pair of [] inside the edges:

{
  "vertices": [
        {"name":"V1","description":"First vertex"},
        {"name":"V2","description":"Second vertex"}
  ],
  "edges": [
    [ {"name":"V1","description":"First vertex"}, {"name":"V2","description":"Second vertex"} ]
  ]
}

nkiesel avatar Feb 15 '19 02:02 nkiesel

Attached are 2 little Python scripts (renamed to ".txt" to be able to attach them) I use for converting between JSON and YAML. One caveat is that j2y does not try to find and create references (arguably correct; just because 2 nodes are the same might not mean that they should be treated as identical) j2y.txt y2j.txt

nkiesel avatar Feb 15 '19 03:02 nkiesel

BTW: YAML only allows to reference "anchors" that were defined earlier and thus does not allow forward references. Thus, you cannot create cycles using YAML anchors and aliases.

nkiesel avatar Feb 15 '19 03:02 nkiesel

Just found this in the YAML 1.2 specification: "The alias refers to the most recent preceding node having the same anchor.". Thus, these anchors do not have to be unique in the YAML document. This could put a dent in your "object id" idea, no?

nkiesel avatar Feb 15 '19 03:02 nkiesel

Interesting: I did not realize that only backward references are allowed. This does simplify life a little bit; however, ability to reuse anchors would be quite problematic as Jackson assumes uniqueness of ids (within scope, usually type). But I wonder if reuse is commonly used....

cowtowncoder avatar Feb 18 '19 19:02 cowtowncoder

I guess I'm underestimating the problem but if we would only cover anchors in parsing, then it seems what would be needed to manage an updateable Map<String, JsonNode> and replace all references with their map value as the YAML is parsed.

Generating YAML with anchors is much harder because it would require to detect common JsonNode. Also, a given PoJo could be converted into multiple semantically equivalent YAML strings.

nkiesel avatar Feb 19 '19 01:02 nkiesel

The problem is not complex at logical level, but rather at implementation. Databind that deals with Java objects (including representations like JsonNode) are fully format-agnostic, and operate through limited set of shared abstractions, modelled based on original JSON use case. System is designed to support forward references as well (something not need with YAML, apparently), which is good except that it does complicate handling.

cowtowncoder avatar Feb 25 '19 18:02 cowtowncoder

Understood.

Regarding re-use of forward references: not in my use cases, I exclusively use them to have a more compact - and easier to understand - structure. But not sure if that is the case generally. I could envision a YAML file where someone uses these references to always point to the "current item of type ".

nkiesel avatar Feb 25 '19 20:02 nkiesel

Regarding re-use of forward references: not in my use cases, I exclusively use them to have a more compact - and easier to understand - structure.

Seconded!

jpassaro avatar Feb 25 '19 21:02 jpassaro

combine this with another setting that essentially indicates that automatic resolution should be used.

Just to emphasize the importance of being able to disable automatic resolution: The feature can be misused and leads to fun stuff like https://nvd.nist.gov/vuln/detail/CVE-2017-18640

bentmann avatar Jan 15 '20 14:01 bentmann

@bentmann Just to make sure: I have no intention of enabling anything that would resolve references to external resources, but only to document nodes within same input document. I am well aware of security issues that come from resolving to external resources (many xml issues for that, some yaml too as you rightly point out).

cowtowncoder avatar Jan 20 '20 00:01 cowtowncoder

It's not just about resolving external references but generally defending against expansion bombs, cf. https://en.wikipedia.org/wiki/Billion_laughs_attack#Variations

bentmann avatar Jan 20 '20 13:01 bentmann

Ah. Yes, there's that too, good point (I am familiar with that problem from xml context).

It would only be applicable if Object Id was resolved by expanding contents; my original thinking was (... I think) to retain actual identity and use loops. This would not expand size of content, although code using such structures (possibly cyclic graphs, if YAML [and more specifically, SnakeYAML] allows this -- I am not sure YAML does actually) would need to be aware that to avoid problems at level.

So ability to prevent expansion probably makes sense no matter what.

cowtowncoder avatar Jan 21 '20 18:01 cowtowncoder