htmltest icon indicating copy to clipboard operation
htmltest copied to clipboard

"fatal error: runtime: out of memory" with site with large number of links

Open MarkvanMents opened this issue 3 years ago • 6 comments

Hi Will, I want to use htmltest for the internal links on my large Hugo-generated website. Running on Windows 10, it tests 3095 documents in around 3 minutes and has helped me to resolve some missing internal links. Exactly what I want.

When I put it into my Travis job to build the site automatically, however, htmltest fails with fatal error: runtime: out of memory. It seems that my Travis job is only given 8GB RAM - when I run under Windows, htmltest takes 16GB available to it (out of 24GB physical memory on the motherboard). I tried setting up a 20GB swap file for Travis. This stops the fatal error, but htmltest is still running after 30 minutes and eventually gets killed by Travis.

Is this a known issue? Are there any options I have missed to tell htmltest to use less memory?

Let me know if you would like any more information. I would really like to use your tool if I can as it seems very fast and flexible.

MarkvanMents avatar Jan 27 '22 15:01 MarkvanMents

A bit more information on this issue: I am using Hugo with Docsy to produce a site. There is a very large number of links and anchors which Hugo/Docsy creates in the sidebar. Since these are generated automatically by Hugo, I don't need to test these. I only want to test the links and anchors which are in the

...
section of the file. I can't see any way to tell htmltest to only look at this part of the file. If I force htmltest to ignore the Docsy-generated menu items using data-proofer-ignore it still seems to use the same amount of memory and build all the cross-referencing - it just doesn't report any errors on them. And there is no way ignore the anchors/ids in any case.

It would be great if there was a way to only test a part of a document (e.g. within the

tags, but I realise that this is a feature request, so I don't expect it to be added any time soon (if ever).

Thanks for all the work you have put into htmltest. It is a pity that the combination of Hugo and Docsy is producing an unmanageable number of links otherwise I would have no hesitation in using it.

MarkvanMents avatar Feb 10 '22 15:02 MarkvanMents

Hey @MarkvanMents thanks for the follow up on this. Yes, in short term it's unlikely I'll be able to have a look at this. If I do, or someone else can can you provide either your built site or an example that produces the same effect? Thanks

wjdp avatar Feb 15 '22 01:02 wjdp

Hi @wjdp Thanks for getting back to me, Will.

I have managed to solve this for my case by ignoring all the <aside> tags. I'm busy getting my site finished at present, but will look at making a solution configurable through .htmltest.yml when I have more time. I think I'm an exception and most users will want to continue to test their whole site. If I make a PR for my change I will link it to this issue.

The GitHub source for the site I am building is here: https://github.com/mendix/docs-site-test (work in progress as you can tell from the title) This uses the Docsy theme of Hugo to build a site on AWS here: http://mendix-new-docs-site.s3-website.eu-central-1.amazonaws.com/

Docsy generates huge HTML files in our case because every file has a sidebar with around 3000 links. Luckily, the sidebar is within <aside> tags and I have hardcoded this cludge to remove these from the htmlNode structure: https://github.com/MarkvanMents/htmltest/commit/e84902edd8e575aa0ca20793c8b6e1e934614362.

My hard-coded change at least enables me to run htmltest and reduces the memory used in Windows from 16GB to around 700MB, taking about the same length of time in both cases.

Not sure whether the current design of htmltest would allow a more memory-efficient solution if I had wanted to test links in all the <aside> sections as well. I don't know enough about Hugo's memory management to know how you would do that. And the great thing about htmltest is the speed, so you wouldn't want to slow down htmltest just to solve these extreme cases.

But at least I can solve my particular case by significantly reducing the size of the Parse tree.

MarkvanMents avatar Feb 17 '22 11:02 MarkvanMents

Looking at the code change you've done this seems like a good fit to extend the data-proofer-ignore attribute so instead of ignoring late it could just remove the node (and therefore children) before parsing the doc. Not quite as extensible as a user configurable filter but given the niche nature of this I'm hesitant to suggest adding another config option.

wjdp avatar Feb 17 '22 11:02 wjdp

Thanks for that idea - sounds like a simpler solution, can be implemented in the same way, and means that any block can be ignored. I'll look at making the change this way when I get around to removing the hard-coding.

MarkvanMents avatar Feb 17 '22 11:02 MarkvanMents

Hi Will, Our site is now live with htmltest at https://github.com/mendix/docs. I have applied the updated version from #188 in this request (https://github.com/mendix/docs/pull/4511) on our production documentation site. It performs the test without running out of memory. I hope my colleagues will enable me to put it live next week.

Thanks for developing htmltest - it is much faster and more flexible than the homegrown code it is replacing. Hope my PR resolves issues for others as well.

MarkvanMents avatar Apr 29 '22 11:04 MarkvanMents