osv.dev icon indicating copy to clipboard operation
osv.dev copied to clipboard

Some URLs in the .json files cannot be serialized into URI objects in Java

Open jimshowalter opened this issue 1 year ago • 4 comments

Describe the bug Trying to deserialize the .json files into Java using Jackson fails on some files with errors like this:

Cannot deserialize value of type java.net.URI from String "https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106": not a valid textual representation, problem: Illegal character in fragment at index 50: https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106

There doesn't appear to be anything wrong with the URIs that it fails on. You can go to them in a browser.

To Reproduce Steps to reproduce the behaviour:

  1. Use https://www.jsonschema2pojo.org to create Java classes from https://raw.githubusercontent.com/ossf/osv-schema/main/validation/schema.json.
  2. Create a little Gradle project and add the generated classes to it.
  3. Add these dependencies to the Gradle project: implementation 'com.fasterxml.jackson.core:jackson-annotations:2.16.1' implementation 'com.fasterxml.jackson.core:jackson-core:2.16.1' implementation 'com.fasterxml.jackson.core:jackson-databind:2.16.1'
  4. Write a little Java program that iterates all of the .json files, and tries to do this on each one: new ObjectMapper().readValue(<the .json file>, Example.class);

It will fail with the above exception on a few of the files.

Expected behaviour All of the .json files should deserialize into instances of Example. (That's the default class generated by jsonschema2pojo.)

jimshowalter avatar Jan 30 '24 00:01 jimshowalter

It's because https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106 is not technically a valid URI (because of the two #), so Java's URI object rejects it when parsing that string. But browsers (and python it seems) are a bit more lenient and accepts it.

Not sure what should be done here, the URI is the canonical path of what it's pointing to (if you go on that page and remove the first #, it will get readded by the server), so it doesn't look like a data quality issue.

another-rex avatar Jan 30 '24 00:01 another-rex

Thank you for the quick response!

Fortunately for our use case we didn't need the references, so we @JsonIgnore'd them, but this will probably bite someone else when it matters.

jimshowalter avatar Jan 30 '24 02:01 jimshowalter

By the way, the documentation for this repo is the best I've ever seen. Just an absolute delight!

jimshowalter avatar Jan 30 '24 03:01 jimshowalter

By the way, the documentation for this repo is the best I've ever seen. Just an absolute delight!

Thank you for that feedback. We had an absolutely brilliant contract technical writer (@hayleycd) working with us last year.

andrewpollock avatar Jan 31 '24 00:01 andrewpollock

For the record (no pun intended), the problematic records are:

  • GHSA-hcxx-mp6g-6gr9 (aka CVE-2018-16153)
    • https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106
  • GHSA-mf4f-j588-5xm8 (aka CVE-2021-44228, but not aliased?)
    • https://docs.opencast.org/r/10.x/admin/#changelog/#opencast-106
    • https://docs.opencast.org/r/9.x/admin/#changelog/#opencast-910

I note that https://regexr.com/39nr7 validates these URLs, https://validator.w3.org/#validate_by_uri didn't seem to have a problem with it (although I'm not convinced the correct content was retrieved).

I think this is one of those annoying cases of being liberal with what you accept and strict about what you send. If OSV.dev were to "correct" these URLs, they'd actually stop working as intended, I think.

OSV.dev is all Python and Go, so based on @another-rex statement that Python and Go don't have a problem with these URLs, I think we'll just have to close this out as unfortunately "working as intended".

andrewpollock avatar May 08 '24 04:05 andrewpollock