osv.dev
osv.dev copied to clipboard
Some URLs in the .json files cannot be serialized into URI objects in Java
Describe the bug Trying to deserialize the .json files into Java using Jackson fails on some files with errors like this:
Cannot deserialize value of type
java.net.URI from String "https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106": not a valid textual representation, problem: Illegal character in fragment at index 50: https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106
There doesn't appear to be anything wrong with the URIs that it fails on. You can go to them in a browser.
To Reproduce Steps to reproduce the behaviour:
- Use https://www.jsonschema2pojo.org to create Java classes from https://raw.githubusercontent.com/ossf/osv-schema/main/validation/schema.json.
- Create a little Gradle project and add the generated classes to it.
- Add these dependencies to the Gradle project: implementation 'com.fasterxml.jackson.core:jackson-annotations:2.16.1' implementation 'com.fasterxml.jackson.core:jackson-core:2.16.1' implementation 'com.fasterxml.jackson.core:jackson-databind:2.16.1'
- Write a little Java program that iterates all of the .json files, and tries to do this on each one:
new ObjectMapper().readValue(<the .json file>, Example.class);
It will fail with the above exception on a few of the files.
Expected behaviour All of the .json files should deserialize into instances of Example. (That's the default class generated by jsonschema2pojo.)
It's because https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106
is not technically a valid URI (because of the two #), so Java's URI object rejects it when parsing that string. But browsers (and python it seems) are a bit more lenient and accepts it.
Not sure what should be done here, the URI is the canonical path of what it's pointing to (if you go on that page and remove the first #, it will get readded by the server), so it doesn't look like a data quality issue.
Thank you for the quick response!
Fortunately for our use case we didn't need the references, so we @JsonIgnore'd them, but this will probably bite someone else when it matters.
By the way, the documentation for this repo is the best I've ever seen. Just an absolute delight!
By the way, the documentation for this repo is the best I've ever seen. Just an absolute delight!
Thank you for that feedback. We had an absolutely brilliant contract technical writer (@hayleycd) working with us last year.
For the record (no pun intended), the problematic records are:
-
GHSA-hcxx-mp6g-6gr9 (aka CVE-2018-16153)
- https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106
-
GHSA-mf4f-j588-5xm8 (aka CVE-2021-44228, but not aliased?)
- https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106
- https://docs.opencast.org/r/9.x/admin/#changelog/#opencast-910
I note that https://regexr.com/39nr7 validates these URLs, https://validator.w3.org/#validate_by_uri didn't seem to have a problem with it (although I'm not convinced the correct content was retrieved).
I think this is one of those annoying cases of being liberal with what you accept and strict about what you send. If OSV.dev were to "correct" these URLs, they'd actually stop working as intended, I think.
OSV.dev is all Python and Go, so based on @another-rex statement that Python and Go don't have a problem with these URLs, I think we'll just have to close this out as unfortunately "working as intended".