Emit only valid N-Quads from toRdf.
- Check for valid language format.
- Check for valid subject, predicate, object, and datatype IRIs.
- Drop invalid N-Quads.
Unsure if this should be merged as is. I think there are two issues with this:
- performance: In performance critical code where the input data is already known to be valid this is wasted work. One solution is to add a
skipValidationoption totoRdfthat would just omit all validation checks. - error handling: I think silently dropping invalid data is a poor idea. There should be a callback or similar when bad data is found that allows the user to choose to drop it or report errors. This would certainly be useful for debugging, but I imagine would be desired in production too.
@davidlehn I wonder if my update in 804c15b93c9ff7b1c86e965a5feaa5ba0c77b3f0 (PR #354) to isAbsoluteRegex relates to this? Can that be used instead of IRIREF_RE?
@gkellogg I think it depends how strict the checks should be. I think those regexes I added were derived from a spec. Hard to check how correct they even are! Probably covers way more than is normally used. I'd rather see something that's easier to understand. I'm guessing test data could be constructed to fail that other regex. It's just checking for scheme and non-whitespace. Advice on what is appropriate here is welcome.
It is certainly a balance, but maybe we can consolidate to just one expression.
I agree that the "silently drop ..." wording in the spec is unfortunate. Recall that the spec doesn't have a normative way to show warnings, but I think we use such language elsewhere.
From an implementation perspective, it would be reasonable to have some option that does cause an error if an attempt to emit an invalid triple (typically IRI) is made, and maybe you can suggest some wording for a future version of the spec.