LaTeXML
LaTeXML copied to clipboard
author identification for unknown documentclass
With the article document class, LaTeXML is pretty good in identifying authors:
\documentclass{article}
% \documentclass{abc}
\begin{document}
\author{Doe, Jane $^1$, Mustermann, Erika $^2$, Dupont, R. $^3$ and Novak, Jan $^3$ $^*$}
\end{document}
LaTeXML output:
<creator role="author">
<personname>Doe, Jane <Math mode="inline" tex="{}^{1}" text="^1" xml:id="m1">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
</XMApp>
</XMath>
</Math>, Mustermann, Erika <Math mode="inline" tex="{}^{2}" text="^2" xml:id="m2">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
</XMApp>
</XMath>
</Math>, Dupont, R. <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m3">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
</XMApp>
</XMath>
</Math> and Novak, Jan <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m4">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
</XMApp>
</XMath>
</Math> <Math mode="inline" tex="{}^{*}" text="^*" xml:id="m5">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="times" role="MULOP">*</XMTok>
</XMApp>
</XMath>
</Math></personname>
</creator>
For some unknown document class, e.g., abc, it's all messed up:
<creator role="author">
<personname>Doe</personname>
</creator>
<creator before=" " role="author">
<personname> Jane <Math mode="inline" tex="{}^{1}" text="^1" xml:id="m1">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
</XMApp>
</XMath>
</Math></personname>
</creator>
<creator before=" " role="author">
<personname> Mustermann</personname>
</creator>
<creator before=" " role="author">
<personname> Erika <Math mode="inline" tex="{}^{2}" text="^2" xml:id="m2">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
</XMApp>
</XMath>
</Math></personname>
</creator>
<creator before=" " role="author">
<personname> Dupont</personname>
</creator>
<creator before=" " role="author">
<personname> R. <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m3">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
</XMApp>
</XMath>
</Math> and Novak</personname>
</creator>
<creator before=" " role="author">
<personname> Jan <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m4">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
</XMApp>
</XMath>
</Math> <Math mode="inline" tex="{}^{*}" text="^*" xml:id="m5">
<XMath>
<XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
<XMTok fontsize="70%" meaning="times" role="MULOP">*</XMTok>
</XMApp>
</XMath>
</Math></personname>
</creator>
Would it be possible to apply the default logic if the documentclass isn't known?
How can one do anything at all without a “known” doc class. How much should be assumed or defaulted in this case?
DG: Edited Chris's reply for brevity (removed email threaded reply context)
Sent to github in error!
Apologies.
Context: latexml has a somewhat recent invention of a "default class", whenever an unsupported class gets used. That is extremely helpful over arXiv - and not only. The fallback in question is called OmniBus.cls.ltxml, and it is actually based off of the article support.
However, it also loads a range of other dependencies, aiming to provide a wide "safety net" for truly unknown cases. So far, arXiv has been the primary beneficiary of that decision.
In this case, I believe the secondary load responsible for the change in comma treatment is the following line in inst_support.sty.ltxml:
https://github.com/brucemiller/LaTeXML/blob/1ad25908a6d9ceba99ca61629f7e4e5a9bbf9cbe/lib/LaTeXML/Package/inst_support.sty.ltxml#L43-L44
Ok, now that both situations are clear, note that neither variation is correct :
- The article.cls interpretation creates a single
<ltx:personname>. - The fallback interpretation splits on every comma, making 7
<ltx:personname>elements (and creators) - A human reader can tell that there are actually 4 person names/creators.
We could try to do better, but it can also be hard to guess reliably.