SingleFile icon indicating copy to clipboard operation
SingleFile copied to clipboard

CLI option --crawl-replace-urls does not do anything

Open andrewdbate opened this issue 4 years ago • 3 comments
trafficstars

When I run this command:

single-file --output-directory=outdir --dump-content=false --filename-template="{url-pathname-flat}.html" --crawl-links --crawl-save-session=session.json --crawl-replace-urls=true https://en.wikipedia.org/wiki/Thomas_Lipton

none of the files in the outdir directory have URLs of saved pages replaced with relative paths of other saved pages in outdir.

When I run this command, _wiki_Thomas_Lipton.html is downloaded to outdir. This is the file of URL from which the crawl started.

The Wikipedia page https://en.wikipedia.org/wiki/Thomas_Lipton has a link to https://en.wikipedia.org/wiki/Self-made_man in the first sentence. This page was also downloaded by SingleFile as _wiki_Self-made_man.html.

I was expecting the href to https://en.wikipedia.org/wiki/Self-made_man in _wiki_Thomas_Lipton.html to be rewritten to _wiki_Self-made_man.html but it was not. Am I using the CLI options incorrectly?

andrewdbate avatar Oct 19 '21 00:10 andrewdbate

Did you interrupt the command? URLs are replaced when all the pages have been crawled.

gildas-lormeau avatar Oct 19 '21 20:10 gildas-lormeau

No I didn't interrupt the command.

andrewdbate avatar Oct 19 '21 20:10 andrewdbate

Hi @gildas-lormeau! First, I'd like to appreciate for this amazing extension.

I faced the very same issue @andrewdbate discussed.

I tested https://xmrig.com because of its simple hierarchy.

Following internal links on https://xmrig.com should be considered:

image

  • Some links are duplicated inside the page, so I used --filename-conflict-action=skip flag.

This is the command I ran:

./single-file --output-directory=saved --filename-template="{url-pathname-flat}.html" --crawl-links=true --crawl-replace-urls=true --filename-conflict-action=skip https://xmrig.com

As the result, following files were created inside saved directory (as expected):

  • _.html
  • _benchmark.html
  • _download.html
  • _wizard.html

Everything has been well so far but links inside these files are not changed to relative links on file system.

You may find these files useful:

saved.zip

Thanks

amirrh6 avatar Nov 08 '21 18:11 amirrh6