jsonld-java
                                
                                 jsonld-java copied to clipboard
                                
                                    jsonld-java copied to clipboard
                            
                            
                            
                        Order of blank node identifiers during normalization
Hi, I am working on a project to sign LD documents for which normalisation is an important step. My goal is to use a standard normalisation algorithm across language implementations so that signature verification works.
At present, I am working on Python<>Java compatibility. I have observed some differences between the normalised output of PyLD and jsonld-java, specifically in the order and naming of blank nodes. I'm trying to understand if jsonld-java follows the URDNA2015 normalisation algorithm which is implemented by PyLD
I'm including how I invoke the normalisation using both libraries. If needed, I can share the normalised outputs also.
PyLD Invocation
# normalize a document using the RDF Dataset Normalization Algorithm
# (URDNA2015), see: http://json-ld.github.io/normalization/spec/
normalized = jsonld.normalize(document, options={
    'algorithm': 'URDNA2015',
    'format': 'application/n-quads'
})
# - doc is a dict containing a JSON-LD document.
# - normalized is a string that is a canonical representation of the document
#    that can be used for hashing, comparison, etc.
jsonld-java Invocation
final JsonLdOptions options = new JsonLdOptions();
options.format = JsonLdConsts.APPLICATION_NQUADS;
String normalized = (String)JsonLdProcessor.normalize(document, options);
 
// - document is Map<String, Object> containing a JSON-LD document.
// - normalized is a string that is intended to be used for hashing
The jsonld-java implementation is likely to predate an algorithm with "2015" in its name because most of the core here was written in 2012/2013 and it has been in maintenance mode since then. Feel free to contribute a Pull Request to update the algorithm we use to get it to the more recent blank node normalization algorithm.
Note that Normalization is a different spec than JSON-LD; JSON-LD 1.0 makes some claims about blank node labeling, which is not the same as normalization. JSON-LD 1.1 relaxes the requirement for strict blank node labeling.
This code is invoking the normalize function of jsonld.js which uses the non-normative RDF Dataset Normalization Spec. This would take the N-Quads output from a JSON-LD processor and return a normalized representation of those quads with BNode labels generated and quads ordered according to that spec. It's not something you'd expect a general JSON-LD processor to implement, and it would best be done as a separate library. My own is at https://github.com/ruby-rdf/rdf-normalize, which is public-domain and you're welcome to make use of, if it's  useful.
Thank you @ansell and @gkellogg for the clarifications and references. The normalization spec you shared is the same one I am following. By any chance would either of you you know of a Java implementation?
@ansell could you share a reference to the normalization algorithm presently implemented in jsonld-java? Is it by any chance the URGNA2012 algorithm referenced in the spec
For posterity if anyone is wondering, the Java implementation does not match the earlier URGNA2012 algorithm either. I will work on adding the URDNA2015 implementation here.
edit: typo
Hi @kochhar , do you need some help with the implementation?
We're working on it too, based on the work of https://github.com/boumba100/JsonldAndroid/.
We've already removed the android dependencies to add just the source related to the normalization algorithm.
The problem is that this library does not escape properly some chars when serializing normalized RDF datasets so we are passing the reference tests to have all of the errors fixed.
@davidlj95 Which characters are not currently being properly escaped according to the JSON specification?
@kochhar I have a feeling the implementation here predated the inclusion of URGNA2012 in the current draft of the Normalization document. Any pull requests to bring the implementation here up to the current draft would be appreciated.
@ansell in the work I mentioned we're using as a base for the URDNA2015 canonization algorithm, some characters were not escaping properly like double quotes, backslashes... (we detected that while running the 60th test in the RDFDatasetUtils.toNQuad method)
I haven't neither tested or checked if the same happens in this library. The work is based on a copy-paste of this library's code from two years ago. I haven't diff the work's code and this library's code to see if the developer introduced the bug in their fork or it was already here.
I'll give it a look and answer soon (code is pretty different).
BTW, I can try to merge the tested (and fixed) URDNA2015 canonization algorithm into this library if I have some time.
@davidlj95
Thanks for clarifying the encoding issues, I haven't noticed any encoding issues with this library, but I am not sure what the current state of testing is for it here.
It seems like there is interest in having the recent URDNA2015 algorithm implemented here, so the contribution would be greatly appreciated.
Last update was 3 years ago. What is the current support of URDNA2015 in the library?