sparql.anything
sparql.anything copied to clipboard
How to get comments from DOCX?
Discussed in https://github.com/SPARQL-Anything/sparql.anything/discussions/430
Originally posted by kvistgaard November 18, 2023 From what I tried so far, it seems they are not accessible. Yet, since they are what I mostly need to get from MS Word documents, I'm hoping that there is a way (I saw such an option for spreadsheets) or that it can be implemented.
@luigi-asprino , any updates on that?
7912bb9 implements the extension to extract comment documents. Comments are interpreted as containers with three slots containing the id, the author and the text of the comment. Comment Containers are attached to the paragraph the comment refers to.
See this docx and its RDF counterpart
@luigi-asprino excellent, I'll give it a try very soon. At a first glance it's not obvious how a comment is linked to what it is a comment on, and the thread: commentY isResponseTo comment commentX
7912bb9 implements the extension to extract comment documents.
Now I see that it's for 1.0. I've been trying with the latest release 0.9.0. When will it be released?
You can try it out with the pre-release v1.0-DEV.4 that has just been created.
https://github.com/SPARQL-Anything/sparql.anything/releases/tag/v1.0-DEV.4
Thanks. Just tested it. Works great. Excellent work. Do you have any thoughts on the threads?
I am reopening it to try to make the comments thread clearer.
At the moment, comments in the same thread are attached as subsequent slots of the container for the paragraph.
Suppose you have a paragraph "Paragraph1" with two comments ("This is a comment" and "This is a reply").
This results into two slots 2 and 3 referencing the comments
<http://www.example.org/document/paragraph/2>
rdf:type xyz:Paragraph;
rdf:_1 "Paragraph1";
rdf:_2 <http://www.example.org/document/Comment_0>;
rdf:_3 <http://www.example.org/document/Comment_1> .
<http://www.example.org/document/Comment_1>
rdf:type xyz:Comment;
rdf:_1 <http://www.example.org/document/Comment_1/Author>;
rdf:_2 <http://www.example.org/document/Comment_1/CommentText>;
rdf:_3 <http://www.example.org/document/Comment_1/CommentId>.
<http://www.example.org/document/Comment_1/CommentId>
rdf:type xyz:CommentId;
rdf:_1 "1" .
<http://www.example.org/document/Comment_1/CommentText>
rdf:type xyz:CommentText;
rdf:_1 "This is a reply" .
<http://www.example.org/document/Comment_1/Author>
rdf:type xyz:CommentAuthor;
rdf:_1 "Luigi Asprino" .
<http://www.example.org/document/Comment_0>
rdf:type xyz:Comment;
rdf:_1 <http://www.example.org/document/Comment_0/Author>;
rdf:_2 <http://www.example.org/document/Comment_0/CommentText>;
rdf:_3 <http://www.example.org/document/Comment_0/CommentId>.
<http://www.example.org/document/Comment_0/CommentId>
rdf:type xyz:CommentId;
rdf:_1 "0" .
<http://www.example.org/document/Comment_0/CommentText>
rdf:type xyz:CommentText;
rdf:_1 "This is a comment" .
<http://www.example.org/document/Comment_0/Author>
rdf:type xyz:CommentAuthor;
rdf:_1 "Luigi Asprino" .
A possible solution would be adding the thread comment number as a slot of the comment.
<http://www.example.org/document/Comment_1>
rdf:type xyz:Comment;
rdf:_1 <http://www.example.org/document/Comment_1/Author>;
rdf:_2 <http://www.example.org/document/Comment_1/CommentText>;
rdf:_3 <http://www.example.org/document/Comment_1/CommentId>;
rdf:_4 <http://www.example.org/document/Comment_1/ThreadCommentNumber> .
<http://www.example.org/document/Comment_1/ThreadCommentNumber>
rdf:type xyz:ThreadCommentNumber;
rdf:_1 "2"^^xsd:int .
<http://www.example.org/document/Comment_0>
rdf:type xyz:Comment;
rdf:_1 <http://www.example.org/document/Comment_0/Author>;
rdf:_2 <http://www.example.org/document/Comment_0/CommentText>;
rdf:_3 <http://www.example.org/document/Comment_0/CommentId>;
rdf:_4 <http://www.example.org/document/Comment_0/ThreadCommentNumber> .
<http://www.example.org/document/Comment_0/ThreadCommentNumber>
rdf:type xyz:ThreadCommentNumber;
rdf:_1 "1"^^xsd:int .
I was imagining something more in the style of sioc:has_reply + sioc:Thread but I guess what you suggest would work equally well.
The relationship between comments and their replies is implicit in the order of the comments. Therefore, sioc:has_reply + sioc:Thread can be materialised with a SPARQL construct if necessary. This is in line with the SPARQL Anything philosophy of using the minimum number of operations to transform data into RDF and leaving the transformation to the user.
@luigi-asprino Currently, there is a document part (paragraph, heading) on which the comment is made that is nicely linked with the comment. Is there a way to extract also the highlighted part of the text of that item on which the comment is made?