LaTeXML
LaTeXML copied to clipboard
A first attempt at documenting latexml.sty and lxRDFa.sty
I’ve been working on this in my spare time, and the latest email thread has caused me to finally get this out the door. This issue consists of six files: latexml.dtx, lxRDFa.dtx, ltxdoc.cls.ltxml, doc.sty.ltxml, script.js, and style.css. They constitute a first attempt at documenting the LaTeXML style files, mostly by copying the existing comments. (I suppose this would be better as a pull request, but then I’d have to figure out where it fits into the file hierarchy. And I’m not that good at Git.)
The first two are standard(ish) dtx files (the next section will assume you’ve never seen such a file), which compile to their .sty(.ltxml) and dvi (or pdf or xml) counterparts for documentation. The .ltxml are my first pass at the corresponding binding, so that the former two can become xml. They are only complete enough so that these two dtx files can have meaningful output. The script and style files adjust the resulting html in a few ways.
Several of the files have % NOTE where something is incomplete.
How dtx files work (if you’ve never used them before)
Running
(pdf)latex latexml.dtx
makeindex -s gind.ist latexml.idx
makeindex -s gglo.ist -o latexml.gls latexml.glo
(pdf)latex latexml.dtx
will create three files: latexml.sty, latexml.sty.ltxml and latexml.dvi (or pdf) (along with the usual supporting files). The former two differ from their counterparts only in comments and blank lines (except as noted below). This can be seen by using the commands
diff latexml.sty latexml.orig.sty | egrep -v '^> %' | egrep -v '^>\s*$' | egrep -v '^---$' | egrep -v '^\d\d?a\d\d'
and
diff latexml.sty.ltxml latexml.orig.ltxml | egrep -v '^> #' | egrep -v '^>\s*$' | egrep -v '^\d\d?\d?a\d'
The latter file latexml.dvi (or pdf) is a rough first draft at documenting the package latexml.sty.
How does this happen? If you look through latexml.dtx, on the first pass it is a regular tex file. Consequently, many of the lines are commented out (and everything after \end{document} is ignored). Two parts that aren’t commented out are
\generate{
\file{\jobname.sty}{\from{\jobname.dtx}{package}}
\file{\jobname.sty.ltxml}{\from{\jobname.dtx}{perl}}
}
and
\DocInput{\jobname.dtx}.
The generate command generates files. Within \generate, the command
\file{to_file}{\from{from_file}{some_label}}
(in its context) causes from_file (in this case, the same file) to be read again, for a second pass. This time, having input docstrip.tex means that:
- commented lines are ignored
- lines beginning with %<some_label> are copied over (removing that prefix)
- lines between %<*other_label> and %</other_label> are ignored
- all other lines are copied over.
Copied lines are written to to_file.
The \DocInput command generates the output dvi (or pdf). It works like TeX’s \input for another pass through the document, except that it ignores all % in the file. This means that all those % \iffalse and % \fi team up to take out a bunch of the file. What remains becomes (using several custom commands from the ltxdoc class and doc package) the output dvi (or pdf).
LaTeXML specific questions
It would be possible to undo this and simply create latexml.tex to document the package. This would have the advantage of not needing docstrip and DocInput. This would have the disadvantage that the documentation would be separated from the packages.
It would also be possible to combine latexml.dtx and lxRDFa.dtx into a single dtx. The advantage is this would cut down the number of files we have. The disadvantage is these files don’t really have much to do with each other.
latexml.pdf, latexml.html, lxRDFa.pdf, and lxRDFa.html should be in the manual somewhere. My best guess would be Section 2.1: Using LaTeXML -> Conversion.
The dtx files don’t currently do any official installation (moving *.sty, *.sty.ltxml and *.pdf to locations where TeX and texdoc can find them). This could be done by the dtx files on their own (I think). But it seems simpler to leave that up to latexml’s installation.
I’ve realized a slight problem with \marginpar and \marginnote. The current approach assumes some text is in the margin because it’s less important and should be deemphasized. But another reason to put something in the margin is to make it more noticeable, akin to subsection headings. This is the approach taken by doc.sty. I’ve mirrored doc.sty in the binding, and then used css to adjust where the margin notes appear.
Differences from the existing files
I’ve added \NeedsTeXFormat{LaTeX2e} to the style files. This wasn’t there before, but we are using several non-TeX commands.
I’ve added the functions \includejsfile and \includecssfile. These can currently be accomplished from a xsl style file or the latexmlpost command line, but I think this would be a better approach. It looks like these commands don’t necessarily need to be in the preamble, but you would know better than I. If they do, we can make that change, or we can just take out the functions.
I’ve also added
DefEnvironment(‘{lxVerbatimJavascript}’,’<ltx:resource type="text/javascript">#1</ltx:resource>');
DefEnvironment(‘{lxVerbatimCss}’,’<ltx:resource type="text/css">#1</ltx:resource>');
(But matching what is used by listings and not using DefEnvironment.) I think I’ve gotten the kinks worked out, but that was a bit tricky.
This is pretty neat! I'm aware of dtx, but haven't used it myself. It's the natural thing to use for distributing latexml.sty separately, say to CTAN, which should happen some day. And I like the fact you've included the .ltxml binding as well, and that the documentation can be integrated directly next to the definitions themselves (I'm getting to dislike POD more & more).
The big catch is that it apparently would require a TeX installation to even make LaTeXML, a dependency we've tried hard to avoid! But perhaps a simple Perl script could mimic it? I'll have to look further into that... and your other questions & enhancements. Thanks!!!
I’d overlooked the TeX independency, which does make some sense. But I agree that it wouldn’t be too difficult to have a perl script split a single file into the needed tex, sty, and sty.ltxml. It looks like a dtx file is what CTAN expects, but they don’t appear to require it. We might be able to simply give them the combined file and the perl script.
Speaking of integrating documentation, is there a reason you don’t integrate the pods into their modules and just have everything at the end?
And by complete coincidence, the first module I opened was LaTeXML::Core. If I understand correctly, it doesn’t appear in the manual.pdf or the manual webpages because LaTeXML::Core doesn’t appear in LaTeXML/doc/manual/genpods (line 55 or so). Is that intentional?
Let’s try this one. Instead of a dtx, I’ve put everything into a perl module. The included make.pl will extract the various files:
perl make.pl --ltxml will strip the comments from the perl module, leaving just the .sty.ltxml files.
perl make.pl --sty will keep only the pod comments in =begin text tags, putting them into the .sty files.
perl make.pl --tex will keep only the pod comments that aren’t in =begin text tags, which document the module. Finally,
perl make.pl --all will do all three. (Other options are --man, --version, and --help.)
The sty and tex options aren’t the best code. For sty, I couldn’t figure out a way to only grab comments in certain tags, so I had make.pl do it manually (which also means it could be buggy). For tex, it currently processes the comments with Pod::Text. But really, I think we want to use genpods. But I’m not sure how we would want to fit it in with genpods (and can’t find Pod::LaTeX anyway). In fact, we’d probably want to passes with the --tex flag: one to generate the documentation for the manual, and one to generate a pdf we could give to CTAN.
Interesting. I'll have to look more closely at what you've done. On the one hand, dtx seems the Right thing for .sty, with the eventual possibility of putting these files on CTAN; and the perl equivalent of docstrip doesn't seem that hard.
But on the other hand, the thing that always bugged me about both POD and docstrip was that the usual mode of using them has the "real" documentation either all at the front or end, rather than interspersed with the code (I'm not really all that interested in the full documentation mode where all the code is printed as well).
That said, it looks like you've come up with a way of interleaving sty, perl & doc that isn't so heavy with the boilerplate. I'll have to explore! Thanks!!
A quick comment -- it would also be totally fine from the LaTeX / CTAN perspective to distribute a .sty file as is and have a separate .tex file which is the typeset documentation. Docstrip is the de facto standard for documenting source code but in this case it sounds like a simpler approach would be more useful.
Although if you're now happy with a perl-based extraction approach that's no longer an issue :)
CTAN would be a useful distribution (although maybe not that useful, since the TeX side of the package doesn’t do all that much to begin with). But I agree that distribution can wait until the commands are more finalized.
But in the meantime, I think it would be helpful to include the package documentation in the manual (maybe Section 2.1: Using LaTeXML -> Conversion). There seem to be recurring questions that end up being answered with “this is solved in the package that you didn’t know existed”.
I would like to give this issue a soft "bump" and state that it would be really nice if we could get latexml.sty in particular up to CTAN, and a little later - texlive.
Now that arXiv has experimental HTML via latexml, I have started getting requests to advise how to include HTML-native content that still works with the PDF workflow (such as .gif images or just some alternative directives that improve the latexml conversion).
Having access to latexml.sty available from texlive would simplify the distribution problem on the arXiv side. Alternatively, they would need to keep a custom file around, which I am unsure they are open to doing nowadays.
I've updated the Perl modules to match the latest sty and ltxml files. I've also updated make.pl so that it can create the sty and sty.ltxml files, and also tex files for inputing into the manual (this step requires CPAN's Pod::LaTeX). There's also a wrapper tex file to get a pdf out of the input files. If you call perl make.pl --all, it will create all of that.
If you open up make.pl to update the variable $GIT_LATEXML to point to your LaTeXML directory, then perl make.pl --check will print a diff of these derived files and the existing versions. The diff is simplified a bit: it tries not to print the difference when the difference is only a comment in the existing version. This helps to show that the only code difference is that these sty files have \NeedsTeXFormat and \ProvidesPackage at the top, and this latexml.sty has \newcommand{\lxRequireResource}[2][]{}.
I tried to copy the existing comments into the modules. I've also introduced a few comments where I wasn't quite sure what the code was doing. It would be a good idea to review those comments and just leave the ones that we want to keep.
I'm not wanting to create a pull request out of this because I'm not sure where you would want to put this into the build (I'm also still working on learning git enough to avoid what happened with my last pull request ;) ). But you can use the code as you see fit. (In particular, you probably want to use code snippets from make.pl instead of copying the entire file and issuing perl make.pl --all.)
@teepeemm it is indeed unfortunate that we all run the risk of our PRs getting closed without getting accepted (still happens to me often!), but they at least make the changes transparent -- downloading and examining the contents of that ZIP file is something that will take even longer for us to get around to.
Improving the way in which we both prioritize and generally do extended documentation coverage is an unsolved problem in latexml day-to-day work...