citeproc icon indicating copy to clipboard operation
citeproc copied to clipboard

Allow inline formatting in locators?

Open badumont opened this issue 3 years ago • 22 comments

With citeproc 0.3.0.9, when compiling the following MWE, the smallcaps in the "section" locator are stripped from the output:

---
suppress-bibliography: true
references:
- type: book
  id: CaesarGallic
  author:
  - literal: Julius Caesar
  title: Bellum Gallicum
---

[@CaesarGallic, {section XI, [iv]{.smallcaps}, 3}, p. 59]

Output (with acta-philosophica.csl):

pandoc -t plain --citeproc --csl=acta-philosophica.csl test.md
[1]

[1] JULIUS CAESAR, Bellum Gallicum, secs. XI, iv, 3, p. 59

However, if I set the output format to native, I can see that iv is wrapped in a SmallCaps object in the value of the citationSuffix property. It is only set to a plain string in the content of the Cite object.

Now, if I modify the body of my markdown file like this:

^[@CaesarGallic [section XI, [iv]{.smallcaps}, 3], p. 59.]

The formatting is retained:

pandoc -t plain --citeproc --csl=acta-philosophica.csl test.md
[1]

[1] JULIUS CAESAR, Bellum Gallicum, sec. XI, IV, 3, p. 59.

It can also be seen that the locator label is plural in the first case and singular in the second.

badumont avatar May 17 '21 14:05 badumont

You can see why this happens from the types:

-- | The part of a citation corresponding to a single work,
-- possibly including a label, locator, prefix and suffix.
data CitationItem a =     
  CitationItem   
  { citationItemId             :: ItemId
  , citationItemLabel          :: Maybe Text
  , citationItemLocator        :: Maybe Text
  , citationItemType           :: CitationItemType
  , citationItemPrefix         :: Maybe a
  , citationItemSuffix         :: Maybe a
  } deriving (Show, Eq, Ord)

Locator is a plain string, whereas prefix and suffix can be formatted. This representation makes it much easier for us to manipulate locators. I don't know, actually, whether the CSL spec says that formatting should be allowed on locators -- I had thought not, but I may be wrong. @denismaier @bdarcus do you know?

jgm avatar May 17 '21 16:05 jgm

Thank you for your answer. I cant find anything about this in the specification. But anyway, I don't want to argue for this or that solution, but to point to the inconsistency across citation modes.

If you choose not to permit formatting inside locators, could it be noted in the manual?

badumont avatar May 17 '21 16:05 badumont

It's not an inconsistency across citation modes. The difference is that in your first case you've explicitly marked something as a locator using {},

[@CaesarGallic, {section XI, [iv]{.smallcaps}, 3}, p. 59]

(oddly excluding the page?) while in the second

[section XI, [iv]{.smallcaps}, 3], p. 59.]

you haven't done this. Since pandoc's heuristics for locators don't detect this as one, and you don't use the {}, it is treated as a suffix (thus permitting formatting). I suspect that if you use the {} syntax around the whole locator in this case, you'll see the same thing as in the first case.

jgm avatar May 17 '21 16:05 jgm

I haven't checked, but am pretty sure we're silent on that question ATM.

bdarcus avatar May 17 '21 16:05 bdarcus

I could change it to allow formatted content, but conceptually this seems like something that should have a solution at the style level (some styles will want to format roman-numeral locators with small caps, others with large caps, etc.).

jgm avatar May 17 '21 16:05 jgm

... this seems like something that should have a solution at the style level?

Yeah, I can see that. I just don't recall it coming up.

Thoughts on this @bwiernik?

bdarcus avatar May 17 '21 16:05 bdarcus

Sorry, I thought that in @baz [chap. 1], the brackets were intended to enclose the locator, like the curly braces in normal citation mode. I understand now.

I excluded the page because CSL only supports one locator, so I had to format it myself.

In this case one can set all the locator to small caps, so it is not so big a problem. It would be if one had to put some part in italics (like prooem.). Since CSL handles the locator in a monolithic way, it can't be supported by the style.

badumont avatar May 17 '21 16:05 badumont

It would also be useful to print folios like f. 35v.

badumont avatar May 17 '21 16:05 badumont

In your case the best workaround is probably to manually format the locators. You just need to block pandoc from treating them as locators; I think you could do that using something like

[@CaesarGallic, {}section XI, [iv]{.smallcaps}, 3, p. 59]

(untested)

jgm avatar May 17 '21 18:05 jgm

I'll have to check in the specs, but Zotero allows formatting for locators.

grafik

denismaier avatar May 18 '21 08:05 denismaier

Ok, in the spec locator is currently listed under standard variables, and this is what the spec says:

locator a cite-specific pinpointer within the item (e.g. a page number within a book, or a volume in a multi-volume work). Must be accompanied in the input data by a label indicating the locator type (see the Locators term list), which determines which term is rendered by cs:label when the “locator” variable is selected.

In my understanding that means that the locator should not be treated differently than any other variable, the label mechanism aside.

denismaier avatar May 18 '21 08:05 denismaier

OK, resolved then to change this to allow formatted locators.

jgm avatar May 18 '21 15:05 jgm

Tricky aspects of this: currently we substitute the and term for & in locators (Eval.hs, l. 1440). We'd need a way to do this that works with any kind of formatted type. [EDIT: This should be easy using mapText.]

formatPageRange (l. 1394) also does some string manipulation on the locator. [This will be trickier.]

What makes this harder in citeproc is that citeproc is polymorphic on the output format -- it could be any structured type that instantiates a certain class (CiteprocOutput), so ALL we can use are the methods defined for that class. We may need to add new methods to allow these operations.

[EDIT: Changes to pandoc would also be needed:
parseLocator in T.P.Citeproc.Locator would be modified to return [Inline] instead of Text for the locator.]

jgm avatar May 18 '21 16:05 jgm

I don't see why we would need any specific support at the style level--I think we should just be able to apply the standard inline text formatting to locator contents, as appears to have been implemented.

bwiernik avatar May 28 '21 20:05 bwiernik

What I meant is that a style might want to specify, for example, that all locator labels are small caps. That can't be done currently.

Oh I see. This should be addressable with the same new syntax that will be needed to style multiple locators. https://github.com/citation-style-language/schema/issues/342

bwiernik avatar May 28 '21 20:05 bwiernik

Actually I think you're right that this can be handled in the regular way.

jgm avatar May 28 '21 20:05 jgm

The locators contents could be handled in the regular way. For formatting of locator labels, that would need to be in a style, which would require something like the <locator> structure I linked to.

bwiernik avatar May 28 '21 20:05 bwiernik

Ah, okay.

jgm avatar May 28 '21 21:05 jgm

I don't think that it is worth opening a new issue, so I add it here: the same problem arise with name variables, especially when citing works written by kings, emperors, popes or bishops in languages where the ordinal suffix should be in superscript (such as "Justinien Ier").

Again, the specifications are silent about it, but Zotero does parse HTML-like markup in name fields.

badumont avatar Feb 23 '22 22:02 badumont

From my perspective, markup should be allowed in any variable

bwiernik avatar Feb 23 '22 22:02 bwiernik

Fortunately you can still do "Iᵉʳ". The locale file uses unicode superscripted characters:

    <term name="ordinal-01" gender-form="feminine" match="whole-number">ʳᵉ</term>
    <term name="ordinal-01" gender-form="masculine" match="whole-number">ᵉʳ</term>

jgm avatar Feb 23 '22 23:02 jgm

Btw, the issue for names is #63.

jgm avatar Feb 23 '22 23:02 jgm