Lint instrumentation tests on Gibbon
As part of the restructure, I've instrumented lint, rather extensively in the end, and have identified the following hotspots when linting Gibbon, i.e. individual things that took over a minute. Line numbers are from master. I italicized the two that I'm guessing maybe could be improved; I assume the others can't, but if you have ideas on them, all the better.
- [ ] Applying the local CSS to the DOM (line 1493) takes 1.5 minutes. I assume there's nothing to be done about that. (Applying core took 25 seconds.)
- [ ] Checking unused selectors (line 1654) takes 1.5 minutes; almost all of the time is in checking
p span iX. Gibbon has six of them, and each one took 14 seconds. I assume there's nothing to be done about that, either. - [ ] The s-039 test (line 3101) took 2.5 minutes. This is the ibid check, and the xpath finds the bids and then tests their ancestor. I wonder if just finding the ibids first, then looping on them for the ancestor check would speed this up? Sorry, out of my expertise to come up with an alternative.
- [ ] The s-057 test (line 3116) took 2.5 minutes. This checks that the backlink and the endnote ID are the same number. I assume there's nothing to be done about this; that's just how long it takes to check 9,000 endnotes (that's about 60 a second?). :)
- [ ] The t-059 test (line 2390) took 2.5 minutes. This checks for a period in a cite; again I wonder if finding the cites first and then looping through them for the rest of the test would be faster. (Although, there are 1653 cites in Gibbon's endnotes, so maybe there's nothing to be done about that, either.)
Those five things are over 10.5 minutes of the 12.5 minutes it takes to lint the endnotes; everything else is trivial.
If you think it's worth trying something else on any of the above and want to put the replacement code here, I'll plug it into the version with the instrumentation and try it.
Good finds. This really highlights the need for a better test structure for lint. I'm already nervous merging in the refactor because we're finding issues with it. We can't really test it because we never bothered to set up tests for all of these checks. So, before we start trying to optimize xpaths, we should write up tests first that we can compare against. Many of them are straightforward, but just as many are crafted to work around various edge cases that we don't want to match.
I definitely agree about the tests—I mentioned early on that ideally we would have at least one for each message, and, for ones like a couple of these, multiple that include the ones being excluded. But, that's a significant project on its own.
It's definitely not necessary to make any further changes for the above before release; the restructure is already faster with the one change we did make, and no one's complaining, even on Gibbon.
You are of course right to be nervous :), but the upside is that 99% of the differences in the restructure are just moving things around into functions, not changing the code itself. The only real changes I made are to move some of the individual flags into a structure.
If you want to wait on a more robust test suite, that's not a problem, either, just noting that rebasing when changes are made to lint in master is a real pain. Nothing that can't be accommodated, obviously. :)
In both of the highlighted tests above, i.e. s-039 and t-059, the initial test (finding the ibids for s-039, finding the cites for t-059) takes pretty much the entire time, so I don't think they can be improved.
For s-039, I ran
time se xpath "/html/body//li[re:test(@epub:type, '\\bendnote\\b')]//abbr[re:test(., '\\b[Ii]bid\\b')]" endnotes.xhtml
For t-059, I ran
time se xpath "/html/body//li[contains(@epub:type, 'endnote')]//cite[not((./node()[last()])[name() = 'abbr'])]" endnotes.xhtml
I'm therefore going to close this, as I don't think there's anything to be done. It's just a consequence of the size and complexity of Gibbon's endnotes file.