djot icon indicating copy to clipboard operation
djot copied to clipboard

Citations

Open jgm opened this issue 2 years ago • 34 comments

We need a syntax for citations that can be plugged into citeproc-lua or sent to pandoc for processing.

Pandoc's citation syntax seems a good basis. One thing we might change would be the syntax for author-in-text citations, which is currently a bit tricky to parse, because it requires lookahead.

Perhaps instead of

@foo [p. 15]

we should have something like

[+@foo, p. 15]

jgm avatar Jul 31 '22 17:07 jgm

I like the idea of djot having a simple unambiguous syntax for this that is less tricky to parse. It not only makes djot simpler and faster, but it also makes it easier for any future alternative implementations of djot to parse as well.

uvtc avatar Aug 01 '22 01:08 uvtc

@jgm , why do you suggest adding that + sign in there? Why not [@foo, p. 15] instead?

The [+@foo, p. 15] syntax suggests to me that it's one example of a more general syntax, as in

[+@foo ... ]  for citations
[+&foo ... ]  for ... maybe something else
[+*foo ... ]
[+_foo ... ]

uvtc avatar Aug 05 '22 05:08 uvtc

[@foo, p. 15] is fine for a regular citation which might render as (Foo 2000, p. 15). I'm talking about syntax for an author-in-text citation, which would render as Foo (2000, p. 15).

jgm avatar Aug 05 '22 05:08 jgm

Ah. I'm not very familiar with citations. Thanks.

uvtc avatar Aug 05 '22 05:08 uvtc

This syntax seems very natural to me. Same goes for using [+@foo, p. 15] for author-in-text citations. No need to reinvent the wheel, and the citeproc syntax is familiar for many.

Djot will be a perfect fit for academic writing---a natural continuation of Pandoc Markdown, which many (including me) are using in academia today. Thus, having a well-defined citation syntax seems very important to me. What will it take to implement this? I would be happy to help if I can!

kmaasrud avatar Nov 10 '22 13:11 kmaasrud

Org-Mode, another markup language added citation support in 9.5.

In that release they added the following syntax to markup a citation:

According to [cite: common prefix;@Key123 page 13; @Key982 chap 1; common suffix] ...

Which would render as (Key123 2000, pp. 13; Key982 2009 chap. 1), for example. They also allow you to specify a style of citation:

[cite/t/c: ...]
      ^ ^
      | |
      | Variant
      Style (Here, "t" means in text) ala: Foo (...)

The blog post from a contributor to Org-Mode lays it all out better than I could ever do in a GH issue: https://blog.tecosaur.com/tmio/2021-07-31-citations.html

Crucially, this kind of syntax would allow people to set different styles on each citation, which it seems is not (easily) accomplished in the discussed syntax proposal.

NotAFedoraUser avatar Jan 14 '23 02:01 NotAFedoraUser

According to [cite: common prefix;@Key123 page 13; @Key982 chap 1; common suffix] ...

That looks similar to what @jgm is proposing and the current syntax used by pandoc-citeproc, just with an english-defined syntax (using the word cite), which we would like to avoid.

I'm still in favour of encapsulating a citation fully in square brackets for easy parsing, and I think the choice of @ for simple cites and +@ for author-in-text should be enough customization.

kmaasrud avatar Jan 14 '23 10:01 kmaasrud

The org-mode syntax (which draws on and extends the pandoc syntax) gets more flexibility (different styles) at the price of verbosity and English-language keywords. So each has its drawbacks and its advantages.

jgm avatar Jan 14 '23 18:01 jgm

From the currect proposal this:

In [+@Smith2014 page 21-23] he talks about...

Turns into this:

In Smith (2014, pp. 21--23) he talks about...

Whereas to do the syntax ala Org-Mode:

In [cite/t:@Smith2014 page 21-23] he talks about...

While Org-Mode's syntax is longer winded, it is more flexible, allowing for more styles of citations, [cite/a:] or [cite/n:] or [cite/t:] I suppose one could accomplish the same task with a modification of the current proposal to include something like the following:

[-@Key] /* nocite (For inclusion the printed bibliography) */
[+@Key] /* in text cite (Smith (pp. 21-23)) */
[/@Key] /* author name citation (Smith) */

Perhaps this makes more sense for djot?

NotAFedoraUser avatar Jan 14 '23 18:01 NotAFedoraUser

[-@Key] /* nocite (For inclusion the printed bibliography) */

@NotAFedoraUser that is very clever! Along with +@, I think that should be sufficient for most use-cases. However, the [<some-punctuation>@<key>] scheme leaves room for a lot of flexibility down the road---if more citation variants are requested.

kmaasrud avatar Jan 14 '23 21:01 kmaasrud

@jgm :

at the price of verbosity and English-language keywords

I hope both will be avoided!

bpj avatar Jan 14 '23 21:01 bpj

Just came across djot; cool!

@jgm - I just thought I'd remind you about one wrinkle we stumbled on in org-cite development, which is the question of whether a local variant is a property of the citation as a whole (where we came down with org-cite), or the individual citation-reference (as it is in pandoc).

E.g. what happens if you have more than one reference in a citation with your proposed examples (the first example being where the author lists differs, and second where they don't)?

[@foo, p. 15;+@bar]
[@foo1, p. 15;+@foo2]

bdarcus avatar Feb 08 '23 14:02 bdarcus

@bdarcus the proposal floated above was to use + for author-in-text citations. The thought was that it would go at the beginning of the citation list, thus

[+@foo, p. 15; @bar]

which would be equivalent to pandoc's

@foo [p. 15; @bar]

I hadn't envisioned allowing it to be put on subsequent items, and I'm not sure what sense that would make. Maybe I haven't grasped your thought here.

jgm avatar Feb 08 '23 15:02 jgm

@jgm - in that case, I think I misunderstood, and it's a property of the citation as a whole, which is I think right.

bdarcus avatar Feb 08 '23 15:02 bdarcus

One other difference between org-cite (and biblatex) and pandoc: it has two levels of affixes; one for the citation, and another for the citation-references.

It's useful when you have a multi-cite, and a style may sort the references within the citation.

[cite:see ;@doe22;@doe20, ch. 2]

So presumably in djot, it could just be:

[see ;@doe22;@doe20, ch. 2]

bdarcus avatar Feb 14 '23 12:02 bdarcus

Yes, I think that would be a good approach. However, citeproc doesn't currently support two levels of affixes, so I don't know what we'd do with this.

jgm avatar Feb 14 '23 17:02 jgm

Maybe a simple heuristic to flatten them (like merge with the affix of the nearest reference affix?), and later add support to citeproc as time and interest allow?

You may already have to do something similar when dealing with org-cite?

bdarcus avatar Feb 14 '23 18:02 bdarcus

Is this issue pretty much resolved; just needs to be implemented?

And maybe also relies on #35?

I've been working on a project I have been planning from the beginning to integrate with this once it's available.

https://github.com/bdarcus/csl-next

ATM, I have my own AST, which is basically the new style input template model enhanced with rendered data (current example bibliography reference below), but I'm hoping it should be pretty easy to integrate with djot; both for document processing as a whole, and also to allow djot markup within field strings.

  [
    [ { contributors: "author", procValue: "Doe, Jane" } ],
    {
      date: "issued",
      format: "year",
      wrap: "parentheses",
      procValue: "2023b"
    },
    [ { title: "title", procValue: "The Title" } ],
    undefined,
    undefined
  ]

bdarcus avatar May 19 '23 16:05 bdarcus

I wouldn't call it resolved! There are still a lot of choice points.

jgm avatar May 19 '23 22:05 jgm

About the citation model/syntax itself, or other related issues?

bdarcus avatar May 19 '23 22:05 bdarcus

the former

jgm avatar May 20 '23 00:05 jgm

the former

So what are those outstanding questions?

I suppose one, that you may or may not have been thinking about, is locators: string + string parsing (as with the pandoc syntax and most current other examples), vs more structured.

For the project I'm working on, I just merged this, which actually isn't too bad in YAML:

suffix: [see, page: 23, section: V]

But I guess the pandoc optional brackets basically is the same.

I guess another, that came up with org-cite, is where to allow markup within the citation?

bdarcus avatar May 22 '23 23:05 bdarcus

There are lots of questions. Do we want to support a huge range of variants like org? If so, how do we do that without English language keywords? How are prefixes and suffixes handled? How are locators handled? Do we use localized locator labels as in pandoc? How are locators distinguished from other suffix content? I don't have a lot of time right now to work on this, but this should give some idea.

jgm avatar May 23 '23 15:05 jgm

Note: I edited this a bit much later to add something I missed earlier on affixes.

Since I'm thinking about and working on this area ATM, my thoughts:

Do we want to support a huge range of variants like org?

This is indeed the big question, since it's hard to reverse later.

My impulse is to say no, and just have two styles/commands; what in the academic literature on this are called:

  1. integral: AKA citet, textcite, narrative citations.
  2. non-integral: AKA citep, parenthetical citations.

These notions are very general, more so than in the TeX world, and for that reason should go fairly far.

EDIT: Implementing the citation model now; here's for now how I'm dealing with this.

pub enum CitationModeType {
    /// Places the author inline in the text; also known as "narrative" or "in text" citations.
    Integral,
    /// Places the author in the citation and/or bibliography or reference entry.
    #[default]
    NonIntegral,
}

But I could also see:

If so, how do we do that without English language keywords?

Do something like org-cite, but use single characters. But that has its own trade-offs.

How are prefixes and suffixes handled?

I think you're referring to this above?

https://github.com/jgm/djot/issues/32#issuecomment-1430181965

In any case, yes, this is another decision point: affixes only or individual citation references (as in pandoc), or also for the citation as a whole (as in org-cite and biblatex).

Per my comment there, I'd prefer the latter, because the cost is low, and the benefit in terms of flexibility for users high.

How are locators handled? Do we use localized locator labels as in pandoc? How are locators distinguished from other suffix content?

In my in-progress project (which I'm now focusing on a Rust implementation; just haven't done the citation part yet), here's the typescript definitions for locators.

export type Locator = Record<LocatorTerms, string> | string;

type LocatorTerms =
  | "book"
  | "chapter"
  | "column"
  | "figure"
  | "folio"
  | "number"
  | "line"
  | "note"
  | "opus"
  | "page"
  | "paragraph"
  | "part"
  | "section"
  | "sub-verbo"
  | "verse"
  | "volume";

In YAML:

suffix: [see, page: 23, section: V]

But that's a format more for machines; not humans. E.g. it's what the djot markup might be converted into.

This is another tricky area; my impulse is just to do what you've done in pandoc.

Do you see any glaring problems with that?

bdarcus avatar Jun 04 '23 23:06 bdarcus

The pandoc way has worked pretty well. There are occasional requests for more expressive power, but it seems enough for most users.

jgm avatar Jun 05 '23 19:06 jgm

[...] but it seems enough for most users.

Based on my personal experience of academic writing, I concur. The less complexity, the better; that'll keep it simpler for implementors.

kmaasrud avatar Jun 06 '23 07:06 kmaasrud