LaTeXML icon indicating copy to clipboard operation
LaTeXML copied to clipboard

author identification for unknown documentclass

Open nschloe opened this issue 2 years ago • 3 comments
trafficstars

With the article document class, LaTeXML is pretty good in identifying authors:

\documentclass{article}
% \documentclass{abc}

\begin{document}

\author{Doe, Jane $^1$, Mustermann, Erika $^2$, Dupont, R. $^3$ and Novak, Jan $^3$ $^*$}

\end{document}

LaTeXML output:

  <creator role="author">
    <personname>Doe, Jane <Math mode="inline" tex="{}^{1}" text="^1" xml:id="m1">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
          </XMApp>
        </XMath>
      </Math>, Mustermann, Erika <Math mode="inline" tex="{}^{2}" text="^2" xml:id="m2">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
          </XMApp>
        </XMath>
      </Math>, Dupont, R. <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m3">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
          </XMApp>
        </XMath>
      </Math> and Novak, Jan <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m4">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
          </XMApp>
        </XMath>
      </Math> <Math mode="inline" tex="{}^{*}" text="^*" xml:id="m5">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="times" role="MULOP">*</XMTok>
          </XMApp>
        </XMath>
      </Math></personname>
  </creator>

For some unknown document class, e.g., abc, it's all messed up:

  <creator role="author">
    <personname>Doe</personname>
  </creator>
  <creator before="  " role="author">
    <personname> Jane <Math mode="inline" tex="{}^{1}" text="^1" xml:id="m1">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="1" role="NUMBER">1</XMTok>
          </XMApp>
        </XMath>
      </Math></personname>
  </creator>
  <creator before="  " role="author">
    <personname> Mustermann</personname>
  </creator>
  <creator before="  " role="author">
    <personname> Erika <Math mode="inline" tex="{}^{2}" text="^2" xml:id="m2">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="2" role="NUMBER">2</XMTok>
          </XMApp>
        </XMath>
      </Math></personname>
  </creator>
  <creator before="  " role="author">
    <personname> Dupont</personname>
  </creator>
  <creator before="  " role="author">
    <personname> R. <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m3">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
          </XMApp>
        </XMath>
      </Math> and Novak</personname>
  </creator>
  <creator before="  " role="author">
    <personname> Jan <Math mode="inline" tex="{}^{3}" text="^3" xml:id="m4">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="3" role="NUMBER">3</XMTok>
          </XMApp>
        </XMath>
      </Math> <Math mode="inline" tex="{}^{*}" text="^*" xml:id="m5">
        <XMath>
          <XMApp role="FLOATSUPERSCRIPT" scriptpos="1">
            <XMTok fontsize="70%" meaning="times" role="MULOP">*</XMTok>
          </XMApp>
        </XMath>
      </Math></personname>
  </creator>

Would it be possible to apply the default logic if the documentclass isn't known?

nschloe avatar Oct 30 '23 13:10 nschloe

How can one do anything at all without a “known” doc class. How much should be assumed or defaulted in this case?

DG: Edited Chris's reply for brevity (removed email threaded reply context)

car222222 avatar Oct 30 '23 13:10 car222222

Sent to github in error!

Apologies.

car222222 avatar Oct 30 '23 13:10 car222222

Context: latexml has a somewhat recent invention of a "default class", whenever an unsupported class gets used. That is extremely helpful over arXiv - and not only. The fallback in question is called OmniBus.cls.ltxml, and it is actually based off of the article support.

However, it also loads a range of other dependencies, aiming to provide a wide "safety net" for truly unknown cases. So far, arXiv has been the primary beneficiary of that decision.

In this case, I believe the secondary load responsible for the change in comma treatment is the following line in inst_support.sty.ltxml:

https://github.com/brucemiller/LaTeXML/blob/1ad25908a6d9ceba99ca61629f7e4e5a9bbf9cbe/lib/LaTeXML/Package/inst_support.sty.ltxml#L43-L44

Ok, now that both situations are clear, note that neither variation is correct :

  • The article.cls interpretation creates a single <ltx:personname>.
  • The fallback interpretation splits on every comma, making 7 <ltx:personname> elements (and creators)
  • A human reader can tell that there are actually 4 person names/creators.

We could try to do better, but it can also be hard to guess reliably.

dginev avatar Oct 30 '23 14:10 dginev