panflute icon indicating copy to clipboard operation
panflute copied to clipboard

replace_keyword skips over cites

Open holmescharles opened this issue 4 years ago • 3 comments

I want to replace keywords everywhere they occur. See my example:

# This is the heading {#sec:alpha}

This is ref 1 = @sec:alpha

This is ref 2 = {@sec:alpha}

This is ref 3 = [@sec:alpha]

This is ref 4 = [{@sec:alpha}]

Then I run 'pandoc test.md -o test.pdf --filter ../filters/headxref.py'. In brief, this finds all those tags, associates them with the header title, and runs "replace_keyword" on the document, replacing, e.g., "@sec:alpha", with "This is the heading".

This yields the following in the output:

image

So why is it skipping Cite blocks?

holmescharles avatar Jun 04 '20 16:06 holmescharles

With .replace_keyword(), Panflute walks over all elements and replaces the Str() elements where .text exactly matches your input. This means that:

  1. If you have a text "abcde", then replacing "bcd" will not change anythng.
  2. Other attributes of the element are not replaced (such as the url attribute of Link() objects, or in your case, the .id attribute of Citation().

For instance, in ref1, @sec:alpha is interpreted by pandoc as a citation object:

[Cite
    [Citation
        {citationId = "sec:alpha",
        citationPrefix = [],
        citationSuffix = [],
        citationMode = AuthorInText,
        citationNoteNum = 1,
        citationHash = 0}]
    [Str "@sec:alpha"]
]

And the filter modifies the contents of the Str object (but not the citationId!)

Now, in ref4:

Cite
    [Citation
        {citationId = "sec:alpha",
         citationPrefix = [Str "{"], citationSuffix = [Str "}"],
        citationMode = NormalCitation,
        citationNoteNum = 1,
        citationHash = 0}]
    [Str "[{@sec:alpha}]"]
]

You see that the Str object is actually equal to "[{@sec:alpha}]", so nothing changes.

Extending the replace_keyword() function to match substrings is not that difficult though, and it would involve changing just two lines:

https://github.com/sergiocorreia/panflute/blob/43582ccbf53bb2fc370ffd471080c5c34f28fd22/panflute/tools.py#L465 https://github.com/sergiocorreia/panflute/blob/43582ccbf53bb2fc370ffd471080c5c34f28fd22/panflute/tools.py#L473

If there is demand, we can allow partial matches, or even better, maybe regexes? (but that of course will be slow on large documents)

sergiocorreia avatar Nov 10 '20 01:11 sergiocorreia

Follow up question: It seems that replacing the Str text would still leave the citation object, which would be seen by a crossref or citation filter, e.g., citeproc or crossref. Is there a trivial way to replace the whole citation with a Str?

holmescharles avatar Nov 10 '20 18:11 holmescharles

Not with .replace_keyword(), but you can set up a filter that looks for Cite elements and then replaces the element as needed if its contents match the keyword. A bit more cumbersome of course.

Longer term, it might be useful to have a more powerful replace_keyword, if there is enough demand for it.

sergiocorreia avatar Nov 10 '20 19:11 sergiocorreia