pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

CriticMarkup Support?

Open ickc opened this issue 9 years ago • 18 comments

I searched CriticMarkup and found something here and there for example at #2374 and #2814.

I didn't read through everything but the main difficulty seems to be related to the pandoc AST.

I created a script at ickc/pandoc-criticmarkup: using criticmarkup in the pandoc markdown source that manipulate the markdown before pandoc process it (so it is not a filter), it can accept and reject changes on the sources, as well as output a HTML or PDF (via LaTeX) through RAW HTML/LaTeX in pandoc markdown.

Could a functionality like this make into the official pandoc? On one hand it seem to violate the pandoc philosophy that it acts on the AST. But on the other hand CriticMarkup is really about tracking changes during the editing phase but not in the output, so the 2 can be orthogonal. I'm not sure how it should be called though since it is not a filter (and the current "official" way to work with pandoc is template, filter, and YAML).

ickc avatar Apr 24 '16 01:04 ickc

Well, my original objection to including CriticMarkup in pandoc was precisely that it is a source-file transformation rather than an AST transformation. So it makes more sense to make it a separate tool that could be used as a preprocessor in front of pandoc. It seems that this is what you have created.

+++ ickc [Apr 23 16 18:21 ]:

I searched CriticMarkup and found something here and there for example at [1]#2374 and [2]#2814.

I didn't read through everything but the main difficulty seems to be related to the pandoc AST.

I created a script at [3]ickc/pandoc-criticmarkup: using criticmarkup in the pandoc markdown source that manipulate the markdown before pandoc process it (so it is not a filter), it can accept and reject changes on the sources, as well as output a HTML or PDF (via LaTeX) through RAW HTML/LaTeX in pandoc markdown.

Could a functionality like this make into the official pandoc? On one hand it seem to violate the pandoc philosophy that it acts on the AST. But on the other hand CriticMarkup is really about tracking changes during the editing phase but not in the output, so the 2 can be orthogonal. I'm not sure how it should be called though since it is not a filter (and the current "official" way to work with pandoc is template, filter, and YAML).

— You are receiving this because you are subscribed to this thread. Reply to this email directly or [4]view it on GitHub

References

  1. https://github.com/jgm/pandoc/issues/2374
  2. https://github.com/jgm/pandoc/issues/2814
  3. https://github.com/ickc/pandoc-criticmarkup
  4. https://github.com/jgm/pandoc/issues/2873

jgm avatar Apr 24 '16 04:04 jgm

I see. I'm going to put it in the wiki/pandoc extras/Preprocessors then.

ickc avatar Apr 24 '16 04:04 ickc

On 2nd thought, there could be a way to work with CriticMarkup in the AST level.

First of all, philosophically, rather than thinking CriticMarkup as some kind of pandoc-diff, we should shift our perspective to treat it as a Markup, as its name suggested.

Then in the AST level, we can define 5 new elements corresponding to the 5 CriticMarkup. (note that the deletion is almost the same as strikethrough but spaces on 2 ends are allowed, I'm not sure if currently pandoc allow this in the native format).

e.g. the mapping I used for tex and html output is this (color, soul packages is needed in LaTeX):

critic markup HTML LaTeX
{--[text]--} <del>[text]</del> \st{[text]}
{++[text]++} <ins>[text]</ins> \underline{[text]}
{~~[text1]~>[text2]~~} <del>[text1]</del><ins>[text2]</ins> \st{[text1]}\underline{[text2]}
{==[text]==} <mark>[text]</mark> \hl{[text]}
{>>[text]<<} <aside>[text]</aside> \marginpar{[text]}

After treating it as a Markup and new elements defined, accept and reject is a matter of a filter that transform these elements.

Note: there will be problem when the CriticMarkup crosses the boundary of other markup, but the official CriticMarkup also discourage this, using the example of*...{...*...}.

Edit:

To be clear, the above suggested method never make any change to the scource md file. The output option can be set in the command but comes with a default. e.g. --criticmarkup=markup|accept|reject. The default can be markup if we want to emphasize it's primarily a markup but can alternatively be used as accept/reject. On the other hand the default can be accept if we want to use it primaily to track change.

In the case if one really want to accept/reject on the source level, a preproceesor like mind can be used. Alternatively, pandoc can operate it at the AST level and almost like the trick another thread has mentioned that apply -f markdown -t markdown except that it is a special mode that only does that on the 5 CriticMarkup AST elements with the --criticmarkup=accept|reject option.

ickc avatar Apr 25 '16 02:04 ickc

See the earlier thread for the rationale for NOT treating it with new elements at the AST level.

(To see why this won't work, consider that you might want to delete a section from a code block. In a pandoc code block there is no structure -- it's just a string -- so there'd be nowhere to put the critic markup annotations in the AST.)

jgm avatar Apr 25 '16 03:04 jgm

Ok, I see.

ickc avatar Apr 25 '16 04:04 ickc

But I still think if the whole CriticMarkup treated as a markup will make everything fine. As a markup it shouldn't be in a code block. I mean people might want to use that in a code block too, as a tool of checking diff (in that case they should use a preprocessor). So treating CriticMarkup as a markup in pandoc is a partial support of the whole CriticMarkup concept. If abbreviation can be provided as a partial support, this could also be provided as a partial support.

On the other hand, some people do certainly use CriticMarkup as a markup that emphasize things. i.e. they are their own critics and show it to the public. (Many news site do that when they made change after they publish. I even read a news site that uses a lot of these normally used for annotation/critics elements to help the readers digest a long post.)

ickc avatar Apr 25 '16 04:04 ickc

You might look at the pandoc-discuss discussion, and contribute there if you like.

https://groups.google.com/d/msg/pandoc-discuss/STbm1W4ASiU/bYfBOroTkhoJ

+++ ickc [Apr 24 16 21:42 ]:

But I still think if the whole CriticMarkup treated as a markup will make everything fine. As a markup it shouldn't be in a code block. I mean people might want to use that in a code block too, as a tool of checking diff (in that case they should use a preprocessor). So treating CriticMarkup as a markup in pandoc is a partial support of the whole CriticMarkup concept. If abbreviation can be provided as a partial support, this could also be provided as a partial support.

On the other hand, some people do certainly use CriticMarkup as a markup that emphasize things. i.e. they are their own critics and show it to the public. (Many news site do that when they made change after they publish. I even read a news site that uses a lot of these normally used for annotation/critics elements to help the readers digest a long post.)

jgm avatar Apr 25 '16 04:04 jgm

To me the idea of a criticmarkup preprocessor before pandoc is too complicated. We already have the super useful --track-change command for docx input. This is very useful, if only for storing commented text as plaintext in a git repo. Along this line it would be cool to just have a pandoc filter that translates the rather wordy [my comment]{.comment-start id="0" author="Me"}[]{.comment-end id="0"} to {>>my comment<<}. The info on data and author is not important, as this is handled by version control systems like git.

From the philosophy side. One can view critic, comments, deletions as markup for your colleagues (a reduced authorship). Who cares if the document gets published in the end without any comments or stays in a draft version for ever. In that sense comments are a textstyle element and part of an evolving document, stored in a separate version control system.

ttxtea avatar Dec 10 '21 08:12 ttxtea

I wonder if now with lpeg support in filters one could write CM in a Code/CodeBlock, parse the CM with lpeg and the content with pandoc.read(). I wonder if it is supposed to be allowed to nest CM constructs? If not parsing will be easier. The only downside is that you would need to include a class, e.g. .cm on those codes.

bpj avatar Feb 23 '22 08:02 bpj

You could do that, but it seems quite ugly. I am inclined to have another look at supporting critic markup internally -- as noted above, with some limitations. @ttxtea's idea of translating between critic markup and Word annotations is also appealing, although there's a question how we'd reconstruct the author, id, and date attributes when converting back from markdown.

jgm avatar Feb 23 '22 17:02 jgm

I'll reopen this for further consideration.

jgm avatar Feb 23 '22 17:02 jgm

If one does not want to add complexity to criticmarkup one could just ignore and leave out the author, id, and date info . If one translates

{>>my comment<<} to [my comment]{.comment-start}[]{.comment-end} {--delete this--} to [delete this]{.deletion} {++add this++} to [add this]{.insertion}

and translates this to docx, you get a viable document. The comments will render with (no author) and (nodate). However, the missing the id= setting, will cause all coment text to be repeated in all comments. One way arround that is to count the comments in the text, and then add the id="0", id="1", ... to all comment-start and comment-end in the text. But in fact it would just have to be a unique "nuber-hash" for every comment, not necessarily consecutive numbers.

criticmarkup is really simple and does not have information on who made the comment. If one uses git, then the commit has all the info on who made the comment. From a writing perspective maybe comments should be taken serious independent on the author an date (but that just an opinionated opinion ;-)).

ttxtea avatar Feb 23 '22 19:02 ttxtea

These are great ideas. One could also add the author id as a parameter to be set through the YAML metadata, like any other authorship information.

tillgrallert avatar Mar 01 '22 08:03 tillgrallert

@tillgrallert CritcMark content is (at least for every workflow I've bumped into) more commonly authored by one or more parties besides the original author. Where used at all, it is also not uncommon for more than one party to be involved. This is simply not meta data information that is stored in CM like it is for a Word file with comments/change tracking turned on. Unless the YAML metadata you are proposing has some sort of indexing mechanism I can's see how that would allow loss-less round trips between formats.

For my own use (in say an advanced filter) I could reconstruct the CM author information from Git history using a pickax search or blame operation to inform a filter about who authored a given comment, but source of source content tracking hacks like that I can't see how the raw CM markup in a text or markdown source could otherwise provide the metadata to docx or preserve it when coming from docx.

alerque avatar Mar 01 '22 09:03 alerque

These are very fair points! My suggestion is definitely lossy and probably not better than leaving the author id empty for the conversion from .md to .docx. I had just thought about quick and easy ways how to appease users who dislike the '(no author)' string in the resulting word file.

tillgrallert avatar Mar 01 '22 10:03 tillgrallert

I am happy to see this discussion - as I am using CM and would like it to be reflected in the final document. Looking forward to this.

rkrug avatar Mar 21 '22 13:03 rkrug

Perhaps, it is useful to conceptually distinguish the comments from the deletions, additions. Some take comments to the level of a chat style conversation, like {>> ttxtea: oh @originalauthor, your latest version of this paragraph reads much more understandable.<<}. All this conversation style is possible without real author tags. But perhaps an [my comment]{.comment-start author="ttxtea"}[]{.comment-end}indocxcould translate to{>>ttxtea: my comment<<}`

With an addition on the other hand, you dont want to add conversation to it and just stick with the text content.

ttxtea avatar Mar 21 '22 15:03 ttxtea

@ttxtea's idea of translating between critic markup and Word annotations is also appealing, although there's a question how we'd reconstruct the author, id, and date attributes when converting back from markdown.

@jgm Re: the above ☝️ Perhaps the extended syntax proposed in https://github.com/CriticMarkup/CriticMarkup-toolkit/issues/50 fits the bill? It's already implemented in the Obsidian editor with the corresponding Commentator plugin.

From the latest release notes:

Ranges (suggestions + comments) can now contain arbitrary metadata information, this is stored as a JSON encoded string and separated from the main contents by the @@ symbol, example: {++{"author":"Fevol"}@@This is a suggestion++}

See the linked issue for more details.

FeralFlora avatar Mar 08 '24 11:03 FeralFlora