wget-2-zim
wget-2-zim copied to clipboard
wget: Cannot write to page (Is a directory).
I've noticed some index.html
files were missing after scraping a site with your script.
Seems the problem is that if wget downloads some ~binary~ files to a directory then a html page at this directory's path cant be saved to index.html. See example below.
I suggest adding --trust-server-names
opt to wget, but I haven't had enough time to test it yet.
$ tree
example.com
├── index.html
└── main
├── index.html
└── logo.png
$ cat example.com/index.html
<!DOCTYPE html>
<a href="./main/logo.png">MAIN LOGO</a>
<a href="./main">MAIN PAGE</a>
$ cd example.com && python3 -m http.server
$ wget -r http://localhost:8000
‘localhost:8080/index.html’ saved
‘localhost:8080/main/logo.png’ saved
Cannot write to ‘localhost:8080/main’ (Is a directory).