suckit
suckit copied to clipboard
Panic when folder path with dot serves a webpage
When there is a webpage served under /folder/file1.html
as well as under /folder
, this creates a conflict:
In the first case, suckit creates a local folder, and in the second case it wants to save the webpage at the same path as the folder, crashing:
[ERROR] Couldn't create fusor.net/old-boards/songs.com: Is a directory (os error 21)
thread '<unnamed>' panicked at 'Couldn't create fusor.net/old-boards/songs.com: Is a directory (os error 21)', src/logger.rs:42:9
stack backtrace:
0: rust_begin_unwind
at /rustc/1.58.1/library/std/src/panicking.rs:498:5
1: core::panicking::panic_fmt
at /rustc/1.58.1/library/core/src/panicking.rs:107:14
2: core::panicking::panic_display
at /rustc/1.58.1/library/core/src/panicking.rs:63:5
3: suckit::logger::Logger::error
at ./src/logger.rs:42:9
4: suckit::disk::save_file
at ./src/disk.rs:26:21
5: suckit::scraper::Scraper::handle_url
at ./src/scraper.rs:263:33
6: suckit::scraper::Scraper::run::{{closure}}::{{closure}}
at ./src/scraper.rs:313:33
The problem seems a combination of a dot being contained in the folder name and a link leading to this folder without a trailing slash.
In url_helper.rs:28 the missing slash does not trigger the if
path, and the dot in the folder name is interpreted as an extension, so the else
path is not triggered either.
I believe this is the reason why wget
detects the document type by its content instead of filename. Subsequently it cannot convert links on the fly, but only after the download of all webpages has finished, which is the exact behavior observed.
I also have this issue!
I have limited bandwidth at the moment. I will have a look when I can but in the meantime, I encourage everyone that's facing the issue and want it fixed to take a look and submit a PR if they can. I will make time for reviews