engrafo
engrafo copied to clipboard
Improve picking main .tex file from a directory
Some submissions such as 0706.2986 fail to render, because engrafo currently cannot pick the right .tex file to use as the main .tex file. Its current criteria are contained in src/converter/io.js:
// Pick a main .tex file from a directory
async function pickLatexFile(dir) {
if (dir.endsWith(".tex")) {
return dir;
}
const files = await fs.readdir(dir);
if (files.includes("ms.tex")) {
return path.join(dir, "ms.tex");
}
if (files.includes("main.tex")) {
return path.join(dir, "main.tex");
}
const texPaths = files.filter(f => f.endsWith(".tex"));
if (texPaths.length === 0) {
throw new Error("No .tex files found");
}
if (texPaths.length === 1) {
return path.join(dir, texPaths[0]);
}
let docCandidates = [];
for (let p of texPaths) {
let data = await fs.readFile(path.join(dir, p));
if (data && data.includes("\\documentclass")) {
docCandidates.push(p);
}
}
if (docCandidates.length === 0) {
throw new Error("No .tex files with \\documentclass or \\documentstyle found");
}
if (docCandidates.length === 1) {
return path.join(dir, docCandidates[0]);
}
let bblCandidates = [];
for (let p of docCandidates) {
let bbl = p.replace(".tex", ".bbl");
if (await fs.pathExists(path.join(dir, bbl))) {
bblCandidates.push(p);
}
}
if (bblCandidates.length > 1) {
throw new Error(
`Ambiguous LaTeX path (${bblCandidates.length} candidates)`
);
}
return bblCandidates[0];
}
0706.2986 has two .tex files. The first .tex file, psfig.tex, is not the main .tex file, but it contains the following line:
% To use with LaTeX, use \documentstyle[psfig,...]{...}
Engrafo will flag this as a potential candidate, along with the second .tex file townes_arXiv.tex (which is the real main .tex file). Since this submission contains no .bbl file to help the code clarify which candidate is the main .tex file, the render fails.
I propose that we add a regex to match to a line within the file if it contains \documentclass
or \documentstyle
but not if those tags are on lines that begin with a comment %
. Such a regex might look like (?m)^(?!%)(?:.*\\\\document(?:class|style).*)
.
0902.1226, another submission that fails to render, has a similar problem where an incorrect candidate is chosen because it contains a \documentclass
tag. This tag is not at the beginning of the line. It might be a better criterion to match a \documentclass
or \documentstyle
tag that begins the line. This would take care of both submissions.
Sounds good to me! LaTeXML also has some similar logic, which is probably more battle hardened than ours. I forget where it is, but perhaps we could switch to using that, or just copy their logic.
The latexml logic for finding the main .tex is in this file, in the unpack_source
subroutine. One interesting condition is to veto candidates that are arguments of \input
macros. Currently only unpacks ZIP archives though.
Can You assign me this work I would like to work on it. @brienna
is anyone working on this issue? can you assign me this.
I would guess it's still open, perhaps @bfirsh has the ability to assign people to the issue.
Hi. Is this issue still unresolved? I am new to Open Source but would love to give this a try.