LaTeXML
LaTeXML copied to clipboard
JATS: Use Commonmeta to generate JATS metadata
This is a rough feature idea.
SUMMARY: Use a JSON-compatible file format (JSON, YAML, TOML, even JSOML :sweat_smile:) that conforms to https://commonmeta.org or similar in parallel to TeX files to generate JATS XML.
USER SCENARIO:
Technically competent authors that keep article metadata separate from .tex files, for example:
https://github.com/sabinomaggi/ten-years-challenge-pulsed-drive/blob/submission/article/metadata.yaml
Simple scripts can convert such files into .tex files to be \input.
This approach might be a good solve for #1648, #2351, #2352, #2353.
This suggestion sounds a little experimental to the usual LaTeX ecosystem, which as you know is where latexml places its main focus. Having structured inputs is lovely, and life would be easier if all of a document's frontmatter/backmatter was already in a standard structured format before the conversion starts. But in LaTeX it generally isn't, and conventions can vary wildly between different class files for \documentclass.
I can certainly imagine a new commonmeta.sty package with its own commonmeta.sty.ltxml support for latexml, which deals with its own kind of metadata.
The current organizational thinking for latexml is to start such experimental extensions outside of the core code, to avoid eclipsing the "usual" LaTeX needs.
start such experimental extensions outside of the core code
Yeah, that make sense to me to keep the core code (in Perl) separate from any code implementing the initial rough feature idea here.
What I'm thinking now is just hashing out some terminology and extra documentation. For instance, for the medium-term, LaTeXML could output JATS XML which is lacking important metadata by design. Then documentation can describe various ways that this limited JATS can be enhanced to have all kinds of fancy metadata.
In particular, now I am thinking a quick cheap recommendation is for an author to literally hand-code the XML that goes into the article meta and then use a trivial XSLT to merge the LaTeXML output JATS with the hand-coded XML with article metadata. Then additional documentation can suggest other approaches with 3rd party software that are fancier and more DRY (like using JSON/YAML etc...).
What you think @dginev?
BTW, what does "backmatter" mean?
I think we try to go for the usual meanings. Frontmatter are the pieces before the main content - various metadata such as title, date, author, affiliation, as well as larger overview lists, such as a table of contents, table of figures...
Backmatter is the same idea at the end of a doc - appendixes, bibliography, glossary, index... I saw a mention of BibTeX at the front page of commonmeta and assumed there are some provisions for the back as well, but on a closer look it seems not.
I suppose that improving the hyperxmp binding could be a good answer here. hyperxmp allows to add metadata in the tex files for inclusion in the generated PDF and it overlaps with the JATS metadata. LaTeXML already understands a few hyperxmp commands, it should be easy to fill the remaining gaps and then use the data in the JATS stylesheet. (I feel like I mentioned this already ages ago, in EPUB context, but the GitHub search doesn't find it.)
I'm really not a fan of XMP myself (the old thread I had a comment in was here https://github.com/brucemiller/LaTeXML/issues/1440#issuecomment-905898209)
LaTeXML should of course be able to support hyperxmp for people who want to emit JATS using it, that's a useful idea.
But it is perfectly OK to have a separate package (or several) possibly targeting different metadata sets, with different macro dialects. Each can have a .ltxml binding into the XML schema, and then a shared path into JATS.
I think the easy point of agreement is that the latexml XML schema should be expressive enough to carry through any JATS-endorsed metadata. But the LaTeX macro dialect should be open-ended, similarly to how we support all kinds of variations for \author and friends.
I'm not very familiar with the latexml XML schema nor .ltxml bindings. But I like the idea of any mechanism that is extremely flexible for authors to tweak.
It is probably worth mentioning the highly unusual environment I am looking at for LaTeXML: a dialect of JATS XML that is part of Baseprint Document Format (BDF) https://baseprints.singlesource.pub/bdf/.
Long-term stable environments:
A) arXiv AutoTeX B) BDF-to-HTML-to-PDF pipelines ***
Short-term unstable idiosyncratic environments:
C) Makefile/justfile/whatever that generate AutoTeX-compatible files from original "true" source files D) tools that generarate BDF from author source files
LaTeXML could maybe be a component of some cases of D).
I can't speak to what arXiv wants to do, but for B), it does not matter how the source metadata is stored because B) starts with the JATS XML and does not use the original source. For D) I suspect most authors DO NOT want metadata inside LaTeX. For C) I imagine there is a wide variety of preferences.
I'll avoid speculating what makes sense for A), but for the rest I don't see any particular reason article metadata needs to originate from inside a LaTeX file.
[***] Of course, this type of pipeline is under development so it CURRENTLY is not really stable, but that's the long-term goal! :sweat_smile:
It is probably the case that most metadata that JATS is interested in has a corresponding markup in some journal class file, though certainly not all of it in any single class file. And obviously not in article or book. Certainly there are\email and '\orcid` macros in several classes. I think there's a two part solution here.
Firstly, we should make sure that all the JATS relevant metadata is consistently encoded into LaTeXML's XML by any class bindings that define markup for that data. And then make sure that the JATS stylesheet recognizes and converts those metadata into the appropriate JATS elements.
The second part would be to make provision for such matadata to be supplied outside of the document itself, if desired. You can use --preload=[args]whatever trickery to make it look like \usepackage[args]{whatever} was in the document source. It shouldn't be too hard to write a parser for whatever metadata format you're interested in (YAML, JSON, XMP, commonmetadata...) and there are already (perhaps quirky) tools for inserting that data into the generated document.