jsonld.js icon indicating copy to clipboard operation
jsonld.js copied to clipboard

Emit only valid N-Quads from toRdf.

Open davidlehn opened this issue 6 years ago • 4 comments

  • Check for valid language format.
  • Check for valid subject, predicate, object, and datatype IRIs.
  • Drop invalid N-Quads.

Unsure if this should be merged as is. I think there are two issues with this:

  • performance: In performance critical code where the input data is already known to be valid this is wasted work. One solution is to add a skipValidation option to toRdf that would just omit all validation checks.
  • error handling: I think silently dropping invalid data is a poor idea. There should be a callback or similar when bad data is found that allows the user to choose to drop it or report errors. This would certainly be useful for debugging, but I imagine would be desired in production too.

davidlehn avatar Jan 04 '19 21:01 davidlehn

@davidlehn I wonder if my update in 804c15b93c9ff7b1c86e965a5feaa5ba0c77b3f0 (PR #354) to isAbsoluteRegex relates to this? Can that be used instead of IRIREF_RE?

gkellogg avatar Jan 14 '20 23:01 gkellogg

@gkellogg I think it depends how strict the checks should be. I think those regexes I added were derived from a spec. Hard to check how correct they even are! Probably covers way more than is normally used. I'd rather see something that's easier to understand. I'm guessing test data could be constructed to fail that other regex. It's just checking for scheme and non-whitespace. Advice on what is appropriate here is welcome.

davidlehn avatar Jan 15 '20 02:01 davidlehn

It is certainly a balance, but maybe we can consolidate to just one expression.

gkellogg avatar Jan 15 '20 02:01 gkellogg

I agree that the "silently drop ..." wording in the spec is unfortunate. Recall that the spec doesn't have a normative way to show warnings, but I think we use such language elsewhere.

From an implementation perspective, it would be reasonable to have some option that does cause an error if an attempt to emit an invalid triple (typically IRI) is made, and maybe you can suggest some wording for a future version of the spec.

gkellogg avatar Nov 20 '21 18:11 gkellogg