rdflib.js
rdflib.js copied to clipboard
Fix rdfa parser crash; fix incorrect 'ok' flag in fetcher on HTTP 404
The rdfa parser (which i have been noticing when tabulator looks at dokieli documents) has been crashing as it tried to set a read-only 'baseURI' field. I just cut out those lines, not understanding how the parser works .. It needs some tests! Pinging @csarven as that is your baby.
Meanwhile also fixing for speed a fetch problem returning ok-true when status=404 @dmitrizagidulin could you give that a look over, please?
Investigating the tests..
@timbl 👍 from my end, on the Fetcher stuff. (tests are fixed, now pass)
Just an update on the origin of where the baseURI issue was introduced: It looks like at v0.16.0. Somewhere in #192 (re: fetcher.js, util.js).
I think the constructor (what it used to be) is not working as it used to. Suspicious of commit https://github.com/linkeddata/rdflib.js/commit/236d6db6a71482d51368a23766eaf1a17d96f287 .
At the moment, removeHash needs to use options.base (instead of options.baseURI). It also looks like htmlOptions is a no go because toRDFNodeObject is not seeing this.htmlOptions. Lastly, current in process is at some point not seeing the base URI.
Ping @dmitrizagidulin .. I think we shouldn't touch the inner bits of rdfaparser until we see why the global variables are not being referenced properly.
So do you feel we should have a separate PR for the fetcher fix, then, which we can rush through, while we take more time to understand rdfa? would make sense
I can't comment on the fetcher just yet. The issue was introduced as part of a set of conversion change to use require().
@dmitrizagidulin As requested, sample RDFa: https://dokie.li/ , http://csarven.ca/dokieli-rww . Using foreContentType: text/html returns:
{ ok: false,
error: 'Error trying to parse <http://csarven.ca/dokieli-rww> as RDFa:\nTypeError: Cannot read property \'indexOf\' of undefined:\nTypeError: Cannot read property \'indexOf\' of undefined\n at removeHash (/var/www/rdflib.js/lib/rdfaparser.js:295:27)\n at RDFaProcessor.process (/var/www/rdflib.js/lib/rdfaparser.js:305:34)\n at parseRDFaDOM (/var/www/rdflib.js/lib/rdfaparser.js:924:9)\n at XHTMLHandler.parse (/var/www/rdflib.js/lib/fetcher.js:196:11)\n at HTMLHandler.parse (/var/www/rdflib.js/lib/fetcher.js:343:31)\n at /var/www/rdflib.js/lib/fetcher.js:1475:24\n at <anonymous>\n at process._tickCallback (internal/process/next_tick.js:169:7)',
status: 'parse_error' }
The output shows some triples which are coming from whatever is within head but that's not accurate and IIRC that is assembled elsewhere in rdflib.js not from green-turtle. So, that can be put aside for now. Commit prior to https://github.com/linkeddata/rdflib.js/commit/236d6db6a71482d51368a23766eaf1a17d96f287 shows the correct output.
Just to add more fun to the mix - this brings back old nightmares:
There is an issue with base URI setting in Green Turtle when there is a request from the current URL to a different URL. See eg https://github.com/simplerdf/simplerdf/issues/19 , https://github.com/linkeddata/dokieli/issues/132 , https://github.com/rdf-ext/rdf-parser-dom/issues/2 . The proposed fix (which was integrated into rdflib.js awhile back) works only for the current document, so on the command-line, it was working fine before https://github.com/linkeddata/rdflib.js/commit/236d6db6a71482d51368a23766eaf1a17d96f287 . But, not with say making a request to a different URL with XHR. This is only an issue if the document assumes 'this' for URIs eg about="", resource="#foo" - the base URI ends up being whatever URI triggered the request on that document.
I agree that we should split fetcher related stuff with RDFa stuff here. Second, the regression issue with RDFa should be resolved. Although it is not the complete working version for base URI related stuff in RDFa, it at least works with the current document and keeps things on par.
@timbl @csarven we (@dmitrizagidulin @kjetilk) are looking at this; is this PR still useful? Should we fix the conflicting files and merge this?
If it is about brute merging, then don't, because we don't entirely know what it will break further or fix - maybe not even an issue anymore? There may be related issues eg https://github.com/solid/solid-panes/issues/40 , that should be looked at altogether. I'd suggest to leave the PR open - at least from my end because I will have to re-debug to refresh myself on how stuff worked..
If this is nor already the case, we should clean up the rdfa parser so that it takes a base URI as a parameter and uses it though the whole process. No reference to any context to devine other URIs. That is hgow all parser must work: (text, baseURI)