SingleFile
SingleFile copied to clipboard
CLI option --crawl-replace-urls does not do anything
When I run this command:
single-file --output-directory=outdir --dump-content=false --filename-template="{url-pathname-flat}.html" --crawl-links --crawl-save-session=session.json --crawl-replace-urls=true https://en.wikipedia.org/wiki/Thomas_Lipton
none of the files in the outdir directory have URLs of saved pages replaced with relative paths of other saved pages in outdir.
When I run this command, _wiki_Thomas_Lipton.html is downloaded to outdir. This is the file of URL from which the crawl started.
The Wikipedia page https://en.wikipedia.org/wiki/Thomas_Lipton has a link to https://en.wikipedia.org/wiki/Self-made_man in the first sentence. This page was also downloaded by SingleFile as _wiki_Self-made_man.html.
I was expecting the href to https://en.wikipedia.org/wiki/Self-made_man in _wiki_Thomas_Lipton.html to be rewritten to _wiki_Self-made_man.html but it was not. Am I using the CLI options incorrectly?
Did you interrupt the command? URLs are replaced when all the pages have been crawled.
No I didn't interrupt the command.
Hi @gildas-lormeau! First, I'd like to appreciate for this amazing extension.
I faced the very same issue @andrewdbate discussed.
I tested https://xmrig.com because of its simple hierarchy.
Following internal links on https://xmrig.com should be considered:

- Some links are duplicated inside the page, so I used
--filename-conflict-action=skipflag.
This is the command I ran:
./single-file --output-directory=saved --filename-template="{url-pathname-flat}.html" --crawl-links=true --crawl-replace-urls=true --filename-conflict-action=skip https://xmrig.com
As the result, following files were created inside saved directory (as expected):
- _.html
- _benchmark.html
- _download.html
- _wizard.html
Everything has been well so far but links inside these files are not changed to relative links on file system.
You may find these files useful:
Thanks