jsonld.js icon indicating copy to clipboard operation
jsonld.js copied to clipboard

Report invalid data URL errors.

Open BigBlueHat opened this issue 11 months ago • 3 comments

Data URLs are frequently used in JSON payloads to store images or other binary (or non-JSON friendly text) data stored as a string.

The work well stored in JSON-LD terms defined with "@type": "@id" in the context (or less meaningfully as raw string values).

However, the strings are often extremely long and it can be hard to detect error within them such as spaces which may be introduced if the URL is constructed incorrectly or at some point not URI encoded properly (ex: + getting turned into ).

An invalid data URL will currently be treated as a relative URL by the parser:

{
  "image": "data:image/png;base64,qwyfouparst2308in arst829235"
}

The space character in the above (very fake) data URL makes the URL invalid. The parser will therefore drop the term or when parsed in "safe mode" it will throw a "Relative object reference found." error.

It may be useful (at least as an option) to output warnings or errors when invalid data URLs are detected in a document.

BigBlueHat avatar Jan 30 '25 13:01 BigBlueHat

  • Garbage in, garbage out.
  • The checks in toRdf.js use isAbsolute from url.js. It's a very basic check. I assume correct enough, but could use some eyes.
  • "relative {graph,subject,predicate,object} reference" is confusing when it's different types of garbage input. I'm not sure how to best determine what kind of error it is. Some specific checks for, say, whitespace in URLs might help and are cheap. What other heuristic checks would help?
  • Those error names are unofficial "safe mode" ones and could be updated or changed as needed. I think at the time the idea was that if it's not an absolute URL, it's relative. But that's not really true since it could be a bad URL/IRI. If there's a good way to differentiate and have more correct errors, that would be a nice improvement.
  • Any URL with a space has the same issue, so this should likely be a general issue vs specific "data:" handling.
{
  "image": "https://example.com/foo bar"
}
  • A related general debugging improvement that would help for many issues is optional path tracking to more easily narrow down which JSON is causing the errors. Things like that would help people at least know which data exactly is causing an issue. (This has been long planned and I started work but never finished.)

davidlehn avatar Jan 30 '25 18:01 davidlehn

  • A related general debugging improvement that would help for many issues is optional path tracking to more easily narrow down which JSON is causing the errors. Things like that would help people at least know which data exactly is causing an issue. (This has been long planned and I started work but never finished.)

💯

Do we have an issue for this yet? If not, can you make one?

I assume we should just close this one as "wontfix" then (given the rest of your comment), yeah?

BigBlueHat avatar Feb 26 '25 15:02 BigBlueHat

  • A related general debugging improvement that would help for many issues is optional path tracking to more easily narrow down which JSON is causing the errors. Things like that would help people at least know which data exactly is causing an issue. (This has been long planned and I started work but never finished.) [...] Do we have an issue for this yet? If not, can you make one?

I don't think there is one. Maybe my comment above is it.

I assume we should just close this one as "wontfix" then (given the rest of your comment), yeah?

I'm fine leaving it open. Or making something related, but I'm not sure what that would be. It's probably not a bad idea to allow more descriptive errors if/when possible. Current heuristics used here are weak. They could perhaps be improved somehow. If not at a spec level, then extra per-implementation details.

davidlehn avatar Feb 26 '25 17:02 davidlehn