rdflib.js icon indicating copy to clipboard operation
rdflib.js copied to clipboard

RDFa in HTML parsing broken

Open timbl opened this issue 10 months ago • 1 comments

The code seems to assume a RDFaProcessor.trim function which I guess used to exist but was deprocated... (who deprocated things?)

Solution in rdfaparser to switch it for the native stim trim... like foo.node.value.trim()

Failed 200: Fetch of <file:///Users/timbl_1/Content/DesignIssues/Overview.html> failed: Error trying to parse <file:///Users/timbl_1/Content/DesignIssues/Overview.html> as RDFa:
TypeError: RDFaProcessor.trim is not a function:
TypeError: RDFaProcessor.trim is not a function
    at RDFaProcessor.process (/usr/local/lib/node_modules/rabel/node_modules/rdflib/lib/rdfaparser.js:379:37)

timbl avatar Jan 26 '25 11:01 timbl

You seem to have introduced static trim here:

https://github.com/linkeddata/rdflib.js/commit/c74e9112c4a6339388d1b83f64cf6df5a6ab8897

https://github.com/linkeddata/rdflib.js/blob/main/src/rdfaparser.js#L910

Perhaps that needs a revisit?


Aside: The RDFa in https://www.w3.org/DesignIssues/Overview.html seems fine to me. Seems fine with rdfa-streaming-parser.js and http://rdf.greggkellogg.net/distiller?command=serialize&url=https:%2F%2Fwww.w3.org%2FDesignIssues%2FOverview.html&raw


Aside: While RDFa processors still support xmlns for prefix mappings from RDFa 1.0, it is deprecated in RDFa 1.1. Consider changing:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:dct=
"http://purl.org/dc/terms/" xmlns:sioc="http://rdfs.org/sioc/ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
...
<body xml:lang="en" bgcolor="#FFFFFF" lang="en" text="#000000">

to:

<!DOCTYPE html>
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
...
<body prefix="dct: http://purl.org/dc/terms/ sioc: http://rdfs.org/sioc/ns# foaf: http://xmlns.com/foaf/0.1/">

(Move lang and xml:lang to <html>. bgcolor and text are not applied, so remove them from <body>)

csarven avatar Jan 26 '25 12:01 csarven