tools build-loi tests

I had a few minutes today, so I started looking at this. A few questions:

Running build-loi on the corpus yields a different loi in over half the cases. I read the PR comments, but there was never any resolution to all of these differences, most of which are not block level issues, e.g. What are we doing about these differences? Again, I'm not talking about differences where there is block level text; I'm talking about differences in the text itself.
For @apasel422: The second test has chapter files in the golden files that aren't in the input files. That's obviously not correct, but I don't know how it got in that state or what the correct input files are, etc. What should that test look like?
In short, what is supposed to be the difference between test-1 and test-2?

Jul 05 '24 04:07 vr8hub

It occurred to me right after I sent this that I hadn't updated my copy of the corpus, so I did so, and the differences dropped from 45 to 39. But the question above still stands for the remaining differences.

Jul 05 '24 05:07 vr8hub

2. For @apasel422: The second test has chapter files in the golden files that aren't in the input files. That's obviously not correct, but I don't know how it got in that state or what the correct input files are, etc. What should that test look like

Sorry about that; without the test infra accounting for extraneous it was easy for these to get added in mistakenly. The input chapter files should remain unchanged in the output, and there should simply be a new loi.xhtml file present in the expected output.

3. In short, what is supposed to be the difference between test-1 and test-2?

One test is supposed to cover generation of a completely new loi.xhtml file; the other is supposed to update an existing one in place.

Jul 07 '24 20:07 apasel422

No worries, that's what I needed. Thanks!

Jul 07 '24 21:07 vr8hub

@acabal, in addition to question 1 above, I found a dozen or so books that have one or more figures with no id. Based on 7.8.1, I assume that is incorrect (they're all figures, not inline images). Do you want PR's to fix them?

Jul 08 '24 00:07 vr8hub

Yes please, all figures should have IDs. This can be an easy lint check as well.

Jul 09 '24 17:07 acabal

Now that I think about it further, I'm going to revise that rule a little to only require that <figure> has an id. <img> may appear in inline text and typically those are not interesting enough to be addressable via URL. But all <figure>s should be. I'm working on a lint test now.

Jul 09 '24 18:07 acabal

Is this still in progress or should we close it?

Feb 12 '25 19:02 acabal

Sorry, you can close it. I updated everything at the time, I believe; I just checked again, and didn't find any figures without ids.

Feb 12 '25 20:02 vr8hub