pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Support of figure label in JATS

Open ehapmgs opened this issue 5 years ago • 7 comments

I am trying to convert JATS to DOCX and I have noticed the labels of the figures are missing.

For example: The output of the following jats is missing what is inside the <label> element

<fig  fig-type="figure">
<label>Figure 1</label>
<caption>
<p>Some caption</p>
</caption>
<graphic/>
</fig>

Is it possible to add support for that?

pandoc version: 2.13

ehapmgs avatar Mar 21 '21 16:03 ehapmgs

I think it would make sense to add the content of <label> to the figure caption. This might be unwanted when the target format labels figures automatically, so we may want to hide this behind an extension.

The same should be done for tables.

tarleb avatar Aug 23 '24 19:08 tarleb

I don't think we should add the label. This is generally added automatically in most formats that support captions.

jgm avatar Aug 23 '24 21:08 jgm

Ok, that's true. So if we were to handle the element, then a "label" attribute would probably be the better choice.

tarleb avatar Aug 24 '24 07:08 tarleb

Using a label attribute would be a bit dangerous, because label is an HTML attribute name, so it wouldn't get sanitized to data-label. Could use data-label I suppose. But I'm not convinced we should handle the element at all.

jgm avatar Aug 24 '24 07:08 jgm

Or maybe caption-label.

My personal interest here is to have label support in the writer: I have a filter that generates these labels (primarily for HTML), and it would be nice if there was a way to have the JATS writer use that information in a semantically correct way. I could of course write another filter to generate and patch the XML semi-manually, but I'd like to avoid that if possible.

Reader support for labels would certainly be useful when converting to HTML.

tarleb avatar Aug 24 '24 07:08 tarleb

How do these labels differ from captions and id attributes? Since they seem to be elements can they contain styled text?

I have a similar use case with glosses — translations/classifications/etymologies attached to words (called lemmas) in texts. I "encode" them as a span inside a span:

[parole['word' f.sg. \< [parabola]{.smallcaps} 'parable']{.gloss}]{.lemma}

For HTML I use CSS to underline (preferably dotted underline) the lemma and make the gloss a styled pop-up which appears when hovering over the lemma, or a margin note and a filter which prepends the lemma in bold to the gloss, or just display it after the lemma in parentheses or not at all for mobile. For LaTeX I use a filter which turns the gloss into a margin note again with the lemma prepended in bold (and using the marginnote package rather than \marginpar to avoid memory issues!) The main fragility is that the CSS and the filters alike depend on the gloss span being the last child of the lemma span.

Perhaps something similar could be done in the caption for this issue.

(Less relevant to this issue is that I also have a filter which will locate spans with the .lemma class and fetch the gloss from a table in metadata — typically loaded with --metadata-file — keyed on the stringified content of the lemma span — provided the last child isn't a gloss span already. It can even keep a sentinel variable which will be true if the same lemma has already been encountered in the current section, in which case it will have the span stripped instead of having a gloss attached!)

bpj avatar Aug 24 '24 11:08 bpj

How do these labels differ from captions and id attributes? Since they seem to be elements can they contain styled text?

My understanding of labels is that they usually contain the element name and the number. So "Table 1", "Fig. 4", etc. They are generally presented as part of the caption. HTML doesn't have separate markup for these labels, they are just part of the caption.

In JATS the label can contain markup.

tarleb avatar Aug 25 '24 11:08 tarleb