rust-wildbow-scraper
rust-wildbow-scraper copied to clipboard
Daybreak 1.3 errors
thread 'reductor' panicked at 'called Result::unwrap()
on an Err
value: Error(Hyper(Error(Connect, Os { code: 65, kind: Other, message: "No route to host" })), "https://0.0.7.225/11/18/daybreak-1-3/")', src/main.rs:313:41
The cause of this problem is that daybreak-1-2
contains a relative URL in the Next Chapter link.
The proper fix would be to resolve the relative URL against the current URL. But to do that you'd need to track the current URL origin.
As a stopgap, here is a hack that treats all relative URLs as belonging to https://www.parahumans.net/
-
https://gist.github.com/thedewi/31e9d0dc0f375e9afcb3b52ae5b94438#file-src-main-rs-L374-L378
I've also added a bypass for the redirects on the addresses of chapters 16.10 and 20.9, which are also currently stalling the scraper.
Insert my 5 highlighted new lines before line 373, the one that starts if !tup.0.
...
The final chapter is broken too, but harder to fix. The scraper would need to follow the "Next" link right at the bottom of the page, instead of the "Next Chapter" link at top.
Here is another main.rs workaround : https://gist.github.com/adelrune/0d6931e0e56c1436d7ddcbdcc3b4e1b7
It's 90% the same as what thedewi posted but their solution didn't work for me : the relative url started with "https:" instead of a "/".
With the code I posted the program was able to complete the process and create an epub file for ward.
I'm getting what seems to me to be a related error, even when using the fixes thedewi or adelrune provided.
thread 'reductor' panicked at 'called Result::unwrap()
on an Err
value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41
Unfortunately I'm not savvy enough to have any idea where to start on a fix.
I'm getting what seems to me to be a related error, even when using the fixes thedewi or adelrune provided.
thread 'reductor' panicked at 'called
Result::unwrap()
on anErr
value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41Unfortunately I'm not savvy enough to have any idea where to start on a fix.
I have the same issue as well.
thread 'reductor' panicked at 'called
Result::unwrap()
on anErr
value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41 I have the same issue as well.
It looks like you think this was fixed, but it's not. I just downloaded it tonight and have the same issue.
thread 'reductor' panicked at 'called
Result::unwrap()
on anErr
value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41 I have the same issue as well.It looks like you think this was fixed, but it's not. I just downloaded it tonight and have the same issue.
Is three a known work around for this problem? I'm still getting this problem on the latest pull from git
Still happening. I worked around it with:
diff --git a/src/main.rs b/src/main.rs
index a547d4b..fb14b57 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -347,7 +347,7 @@ fn download_iter(
} else {
tup.0 = check.unwrap().attr("href").unwrap().to_string();
if !tup.0.contains("http") {
- tup.0 = "https:".to_string() + &tup.0;
+ tup.0 = "https://www.parahumans.net".to_string() + &tup.0;
}
return download_iter(tup);
}
Took Amomum's src and it doesn't complete all the Ward.
Downloaded From Within - 16.9
Downloaded Redirecting...
The invocation of --ward
seemed to work fine for me with this hit diff.
diff --git a/src/[main.rs](http://main.rs/) b/src/[main.rs](http://main.rs/)
index a547d4b..0722fe3 100644
--- a/src/[main.rs](http://main.rs/)
+++ b/src/[main.rs](http://main.rs/)
@@ -344,13 +344,21 @@ fn download_iter(
}
if check.is_none() || title == "P.9" {
return tup.clone();
- } else {
- tup.0 = check.unwrap().attr("href").unwrap().to_string();
- if !tup.0.contains("http") {
- tup.0 = "https:".to_string() + &tup.0;
- }
- return download_iter(tup);
}
+
+ // Extract URL from html href tag
+ tup.0 = check.unwrap().attr("href").unwrap().to_string();
+
+ // Check to see if it's a relative link, if so assume it's a parahumans.net story
+ if tup.0.starts_with("/20") {
+ tup.0 = "https://parahumans.net/".to_string() + &tup.0;
+ }
+
+ // Ensure it's an https request
+ if !tup.0.contains("http") {
+ tup.0 = "https:".to_string() + &tup.0;
+ }
+ return download_iter(tup);
}
fn process_book(book: DownloadedBook) {
println!("Done downloading {}", book.title);
EDIT: Also tested invocation of Glow-worm and Pale too with this git diff. Will say something/update if there's something wrong with epubs.
@x10an14 this diff seems to insert an extra / into the url in tup.0 = "https://parahumans.net/".to_string() + &tup.0;
, should be tup.0 = "https://parahumans.net".to_string() + &tup.0;
Full diff (working to download ward)
diff --git a/src/[main.rs](http://main.rs/) b/src/[main.rs](http://main.rs/)
index a547d4b..0722fe3 100644
--- a/src/[main.rs](http://main.rs/)
+++ b/src/[main.rs](http://main.rs/)
@@ -344,13 +344,21 @@ fn download_iter(
}
if check.is_none() || title == "P.9" {
return tup.clone();
- } else {
- tup.0 = check.unwrap().attr("href").unwrap().to_string();
- if !tup.0.contains("http") {
- tup.0 = "https:".to_string() + &tup.0;
- }
- return download_iter(tup);
}
+
+ // Extract URL from html href tag
+ tup.0 = check.unwrap().attr("href").unwrap().to_string();
+
+ // Check to see if it's a relative link, if so assume it's a parahumans.net story
+ if tup.0.starts_with("/20") {
+ tup.0 = "https://parahumans.net".to_string() + &tup.0;
+ }
+
+ // Ensure it's an https request
+ if !tup.0.contains("http") {
+ tup.0 = "https:".to_string() + &tup.0;
+ }
+ return download_iter(tup);
}
fn process_book(book: DownloadedBook) {
println!("Done downloading {}", book.title);