monolith Incorporate an option for following links within the same domain to a certain depth

Suggested by HN user ajxs, source: https://news.ycombinator.com/item?id=20774594

Aug 23 '19 04:08 snshn

+1, this is what would make this the tool that I need

Aug 23 '19 06:08 dibstern

How would this work with pages that are linked to multiple times? Would only one link work, or would the page and every resource it links to be duplicated?

Dec 07 '19 00:12 alch-emi

If JS was something we could always rely on, then we'd be able to have just one dataURL link to some sub-page, with other links having something like a href="javascript:<click the first link to this resource on the page>". But we can't imply JS is always on, not to mention one of the features of monolith is to strip document off JS (mostly for security and privacy reasons). Hence the only way to do it is likely to cache nested dataURLs but still include them in the final output. Limiting depth and having code to avoid infinite loops would be the key here, but it's hard to predict what may go wrong, it's a very big and complex feature. Since the main goal of the program is to save the resource as one file, the output should be one file even in case when sub-pages within the same domain are being embedded as dataURLs -- that undoubtedly will result in the file being very large in size and hard to edit due to a hrefs' dataURLs containing whole pages along with their assets; but I'm sure people who will archive web resources that way understand this, and mostly will use this feature for convenience of having one file on their filesystem representing that resource, even if it's very big and ugly. So we can't really save one resource as a separate file here and then just link to it from everywhere, unless we implement two modes of this feature: one where it's one file, and the other where it saves monolithic files next to one another. We'll need to implement an -o flag to let that happen, since the usual stdout way can't really tell where the monolithic HTML file's going to be saved.

Dec 07 '19 12:12 snshn