lowercase of "translated and commented ..." after full stop
Consider this entry:
@book{aristotle:makin:2006,
Author = {Aristotle},
Commentator = {Stephen Makin},
Introduction = {Stephen Makin},
Location = {Oxford},
Publisher = {Clarendon Press},
Series = {Clarendon Aristotle Series},
Subtitle = {Book Θ},
Title = {Metaphysics},
Translator = {Stephen Makin},
Year = {2006}}
Note that the subtitle ends with a Greek letter "Θ".
Run this through \usepackage[backend=biber,style=authortitle-comp,abbreviate=false]{biblatex}
and you will get the following output:
Aristotle. Metaphysics. Book Θ. translated and commented, with an introduction, by Stephen Makin. Clarendon Aristotle Series. Oxford: Clarendon Press, 2006.
Notice the lowercase "t" after the dot after the subtitle. That does not seem right. If I write "Theta" instead of "Θ", everything is fine.
Thanks for your good work!
The following MWE reproduces the problem and shows another very interesting effect
\documentclass{article}
\usepackage{fontspec}
\usepackage[style=verbose]{biblatex}
\usepackage{filecontents}
\begin{filecontents*}{\jobname.bib}
@book{a,
Author = {Aristotle},
Title = {A},
Translator = {Stephen Makin},
Year = {2006}}
@book{b,
Author = {Aristotle},
Title = {Akela},
Translator = {Stephen Makin},
Year = {2006}}
@book{c,
Author = {Aristotle},
Title = {Θ},
Translator = {Stephen Makin},
Year = {2006}}
\end{filecontents*}
\addbibresource{\jobname.bib}
\begin{document}
\cite{a}
\cite{b}
\cite{c}
\printbibliography
\end{document}
Gives in text
Aristotle. A. trans. by Stephen Makin. 2006 Aristotle. Akela. Trans. by Stephen Makin. 2006 Aristotle. Θ. trans. by Stephen Makin. 2006
and in the references
Aristotle. A. Trans. by Stephen Makin. 2006. — Akela. Trans. by Stephen Makin. 2006. — Θ. trans. by Stephen Makin. 2006.
Where you can see that the citation with only "A" as title has the issue only in the citation, the one with "Akela" has no problem at all and the one with "Θ" has the issue in citations as well as the bibliography. Very mysterious.
Doing some more research into this, I found that adding \@ to the titles seems to relieve us of all our troubles.
Is this something that could (and should) be done automatically?
See the very well working example below, where now everything is OK except for the output of e (which I left unchanged on purpose to show the effect of \@)
\documentclass{article}
\usepackage{fontspec}
\usepackage[style=verbose]{biblatex}
\usepackage{filecontents}
\begin{filecontents*}{\jobname.bib}
@book{a,
Author = {Aristotle},
Title = {A\@},
Translator = {Stephen Makin},
Year = {2006}}
@book{b,
Author = {Aristotle},
Title = {Akela},
Translator = {Stephen Makin},
Year = {2006}}
@book{c,
Author = {Aristotle},
Title = {Θ\@},
Translator = {Stephen Makin},
Year = {2006}}
@collection{d,
Editor = {Aristotle},
Title = {D},
Translator = {Stephen Makin},
Year = {2006}}
@book{e,
Editor = {Aristotle},
Title = {E},
Translator = {Stephen Makin},
Year = {2006}}
\end{filecontents*}
\addbibresource{\jobname.bib}
\DeclareFieldFormat[collection]{title}{\mkbibemph{#1}\@}
\begin{document}
\cite{a}
\cite{b}
\cite{c}
\cite{d}
\cite{e}
\printbibliography
\end{document}
Gives now in-text
Aristotle. A. Trans. by Stephen Makin. 2006 Aristotle. Akela. Trans. by Stephen Makin. 2006 Aristotle. Θ. Trans. by Stephen Makin. 2006 Aristotle, ed. D. Trans. by Stephen Makin. 2006 Aristotle, ed. E. trans. by Stephen Makin. 2006
and in the references
Aristotle. A. Trans. by Stephen Makin. 2006. — Akela. Trans. by Stephen Makin. 2006. — ed. D. Trans. by Stephen Makin. 2006. — ed. E. Trans. by Stephen Makin. 2006. — Θ. Trans. by Stephen Makin. 2006.
@josephwright, @aboruvka - this is an interesting suggestion - \@ is clearly correct here and perhaps should be default except that this does depend somewhat on the assumptions of the default styles.
I found that adding \@ to \blx@qp@period also gives the desired effect.
\makeatletter
\csdef{blx@qp@period}{\spacefactor\@m\blx@postpunct}
\makeatother
Then there would be no need to insert the \@ into any field formats.
The question Spurious punctuation in xelatex + biblatex + Cyrillic or Greek on TeX.SX seems to be at least related to the issue here.
I'm quite confident this problem here is actually the same as in https://github.com/plk/biblatex/issues/368. I don't think the underlying problem is solved by sprinkling \@s in field format definitions.
Can you please try this with the current 3.4 DEV version (and biber 2.5 DEV version if you use biber) - I believe this might be fixed.
Doesn't seem to be resolved. I've got this with @moewew's MWE:
Aristotle. A. trans. by Stephen Makin. 2006 Aristotle. Akela. Trans. by Stephen Makin. 2006 Aristotle. Θ. trans. by Stephen Makin. 2006
Aristotle. A. Trans. by Stephen Makin. 2006. — Akela. Trans. by Stephen Makin. 2006. — Θ. Trans. by Stephen Makin. 2006.
But if I make the main language Russian or Greek with for example
\newfontfamily\greekfont{CMU Serif}
\usepackage{polyglossia}
\setdefaultlanguage{greek}
the capitalization is mysteriously correct.
Capitalisation works fine in the bibliography now. In the citations it doesn't work and as the MWE above shows, it doesn't even (and didn't) work for Latin letters either. I guess that setazcodes is not called in citations leaving them with the standard category codes.
I can't claim to have understood the issue fully, but I found that only \blx@setfrcodes sets the sfcodes for capital letters, \blx@setencodes doesn't. From what I can see, \blx@setfrcodes is used if \frenchspacing is active, \blx@setencodes if \nonfrenchspacing; they are called in \blx@setsfcodes.
In the bibliography \frenchspacing is turned on by default, \blx@setfrcodes is called and everything is fine. But in the citations/text we use \blx@setencodes if that is not overridden by either an explicit \frenchspacing or babel/polyglossia's language settings.
One could probably just move the \blx@setazcodes bit directly into \blx@setsfcodes
\def\blx@setsfcodes{%
\let\blx@setsfcodes\relax
\let\frenchspacing\blx@setfrcodes
\let\nonfrenchspacing\blx@setencodes
\ifnum\sfcode`\.>2000
\blx@setencodes
\else
\blx@setfrcodes
\fi
\ifnum\sfcode`\A=\@m
\else
\blx@setazcodes
\fi
\@setquotesfcodes
\sfcode`\(=\z@
\sfcode`\)=\z@
\sfcode`\[=\z@
\sfcode`\]=\z@
\sfcode`\<=\z@
\sfcode`\>=\z@}
\def\blx@setencodes{%
\sfcode`\,=1250
\sfcode`\;=1500
\sfcode`\:=2000
\sfcode`\.=3000
\sfcode`\!=3001
\sfcode`\?=3002
}
\def\blx@setfrcodes{%
\sfcode`\,=\blx@sf@comma
\sfcode`\;=\blx@sf@semicolon
\sfcode`\:=\blx@sf@colon
\sfcode`\.=\blx@sf@period
\sfcode`\!=\blx@sf@exclam
\sfcode`\?=\blx@sf@question
}
alternatively
\ifnum\sfcode`\A=\@m
\else
\blx@setazcodes
\fi
can be replicated in both \blx@setencodes and \blx@setfrcodes.
I'm pretty sure PL set things up as-is deliberately. It's safe to alter the \sfcode values for capital letters when \frenchspacing is active as there is 'nothing to do': spacing after all periods is the same. However, if you alter the \sfcode values and \nonfrenchspacing is active then you'll suddenly get larger spaces after abbreviated capital letters. One might I guess arrange that \frenchspacing applies within a citation, but I'd need to check the grouping is correct.
@josephwright - do you think you will have time to look at this?
I would have thought that \frenchspacing would be appropriate in citations - I can't imagine why one would want anything else.
@plk 'Inside' a citation \frenchspacing is probably fine. The issue comes when the citation ends. If we start with the simple example
\documentclass{article}
\begin{document}
{\nonfrenchspacing Hello world.}\showthe\spacefactor
{\frenchspacing Hello world.}\showthe\spacefactor
\end{document}
you'll see that the space factor is that within the group not outside it. Most of the time punctuation is going to be outside a citation, but if the citation ends in a . or if we have
`\autocite{key}.`
and 'auto-magic' moving of citations (which puts the . inside the citation group) then we'll be wrong in a \nonfrenchspacing document. I guess I might be able to come up with a solution to the latter example, but the general 'what if the last char is a ./!/?' problem isn't fixable. That's probably as likely to happen 'in the wild' as the problem we are asked about here.
Worth noting here is that there is a link to the entire business of being able to 'dump' formatted bibliographies. Currently biblatex does not keep the formatted data 'around' but passes it to TeX for typesetting as it is generated. That means that we can't tell that a piece of punctuation is the last generated by a bibliography driver (so reset \sfcode values) unless it comes right at the end of the driver. Arranging that everything is 'held' in a structured form would avoid that but would be an entire re-write and would be pretty tricky when you start imagining all of the possible forms of the driver. (that's before the entire business that in the document body reading back such data looks very complex and although possibly addressable is at least in part why traditional BibTeX simply can't do some of this stuff). Even if everything is 'held' you still need to track very carefully where punctuation comes from such that \sfcode values are be manipulated. It really looks very tricky (on the border of what can be done at the TeX level).
Is there anything else we need to do here?
Thanks for all your work. As to plk's question: Well, I don't know. moewew's MWE still has in one instance a non-capitalised "trans."
Aristotle. A. Trans. by Stephen Makin. 2006 Aristotle. Akela. Trans. by Stephen Makin. 2006 Aristotle. Θ. Trans. by Stephen Makin. 2006 Aristotle, ed. D. Trans. by Stephen Makin. 2006 Aristotle, ed. E. trans. by Stephen Makin. 2006 <-- HERE References Aristotle. A. Trans. by Stephen Makin. 2006. — Akela. Trans. by Stephen Makin. 2006. — ed. D. Trans. by Stephen Makin. 2006. — ed. E. Trans. by Stephen Makin. 2006. — Θ. Trans. by Stephen Makin. 2006.
@josephwright - any comment?
@moewew, @josephwright - do we have any potential solution to this?
As noted earlier, addressing this would be pretty much a rewrite of the entire package.
This came up again in https://github.com/plk/biblatex/issues/776. Could we remedy some symptoms of this by increasing the \spacefactor from 999 to 1000 at the end of printing a field?