rust-wildbow-scraper Daybreak 1.3 errors

thread 'reductor' panicked at 'called Result::unwrap() on an Err value: Error(Hyper(Error(Connect, Os { code: 65, kind: Other, message: "No route to host" })), "https://0.0.7.225/11/18/daybreak-1-3/")', src/main.rs:313:41

Jan 29 '21 01:01 PointMeAtTheDawn

The cause of this problem is that daybreak-1-2 contains a relative URL in the Next Chapter link.

The proper fix would be to resolve the relative URL against the current URL. But to do that you'd need to track the current URL origin.

As a stopgap, here is a hack that treats all relative URLs as belonging to https://www.parahumans.net/ -

https://gist.github.com/thedewi/31e9d0dc0f375e9afcb3b52ae5b94438#file-src-main-rs-L374-L378

I've also added a bypass for the redirects on the addresses of chapters 16.10 and 20.9, which are also currently stalling the scraper.

Insert my 5 highlighted new lines before line 373, the one that starts if !tup.0. ...

Feb 13 '21 04:02 thedewi

The final chapter is broken too, but harder to fix. The scraper would need to follow the "Next" link right at the bottom of the page, instead of the "Next Chapter" link at top.

Feb 13 '21 05:02 thedewi

Here is another main.rs workaround : https://gist.github.com/adelrune/0d6931e0e56c1436d7ddcbdcc3b4e1b7

It's 90% the same as what thedewi posted but their solution didn't work for me : the relative url started with "https:" instead of a "/".

With the code I posted the program was able to complete the process and create an epub file for ward.

Feb 27 '21 16:02 adelrune

I'm getting what seems to me to be a related error, even when using the fixes thedewi or adelrune provided.

thread 'reductor' panicked at 'called Result::unwrap() on an Err value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41

Unfortunately I'm not savvy enough to have any idea where to start on a fix.

Jun 28 '21 12:06 soldierswitheggs

I'm getting what seems to me to be a related error, even when using the fixes thedewi or adelrune provided.

thread 'reductor' panicked at 'called Result::unwrap() on an Err value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41

Unfortunately I'm not savvy enough to have any idea where to start on a fix.

I have the same issue as well.

Jul 12 '21 01:07 redthing1

thread 'reductor' panicked at 'called Result::unwrap() on an Err value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41 I have the same issue as well.

It looks like you think this was fixed, but it's not. I just downloaded it tonight and have the same issue.

Aug 15 '21 05:08 superkuh

thread 'reductor' panicked at 'called Result::unwrap() on an Err value: Error(Hyper(Error(Connect, Os { code: 10051, kind: Other, message: "A socket operation was attempted to an unreachable network." })), "https://0.0.7.225/11/18/daybreak-1-3/")', src\main.rs:313:41 I have the same issue as well.

It looks like you think this was fixed, but it's not. I just downloaded it tonight and have the same issue.

Is three a known work around for this problem? I'm still getting this problem on the latest pull from git

Nov 17 '21 00:11 bittwiddlers

Still happening. I worked around it with:

diff --git a/src/main.rs b/src/main.rs
index a547d4b..fb14b57 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -347,7 +347,7 @@ fn download_iter(
     } else {
         tup.0 = check.unwrap().attr("href").unwrap().to_string();
         if !tup.0.contains("http") {
-            tup.0 = "https:".to_string() + &tup.0;
+            tup.0 = "https://www.parahumans.net".to_string() + &tup.0;
         }
         return download_iter(tup);
     }

Mar 18 '22 21:03 comex

Took Amomum's src and it doesn't complete all the Ward.

Downloaded From Within - 16.9
Downloaded Redirecting...

Mar 23 '22 23:03 guai

The invocation of --ward seemed to work fine for me with this hit diff.

diff --git a/src/[main.rs](http://main.rs/) b/src/[main.rs](http://main.rs/)
index a547d4b..0722fe3 100644
--- a/src/[main.rs](http://main.rs/)
+++ b/src/[main.rs](http://main.rs/)
@@ -344,13 +344,21 @@ fn download_iter(
     }
     if check.is_none() || title == "P.9" {
         return tup.clone();
-    } else {
-        tup.0 = check.unwrap().attr("href").unwrap().to_string();
-        if !tup.0.contains("http") {
-            tup.0 = "https:".to_string() + &tup.0;
-        }
-        return download_iter(tup);
     }
+
+    // Extract URL from html href tag
+    tup.0 = check.unwrap().attr("href").unwrap().to_string();
+
+    // Check to see if it's a relative link, if so assume it's a parahumans.net story
+    if tup.0.starts_with("/20") {
+        tup.0 = "https://parahumans.net/".to_string() + &tup.0;
+    }
+
+    // Ensure it's an https request
+    if !tup.0.contains("http") {
+        tup.0 = "https:".to_string() + &tup.0;
+    }
+    return download_iter(tup);
 }
 fn process_book(book: DownloadedBook) {
     println!("Done downloading {}", book.title);

EDIT: Also tested invocation of Glow-worm and Pale too with this git diff. Will say something/update if there's something wrong with epubs.

Jul 31 '22 20:07 x10an14

@x10an14 this diff seems to insert an extra / into the url in tup.0 = "https://parahumans.net/".to_string() + &tup.0;, should be tup.0 = "https://parahumans.net".to_string() + &tup.0;

Full diff (working to download ward)

diff --git a/src/[main.rs](http://main.rs/) b/src/[main.rs](http://main.rs/)
index a547d4b..0722fe3 100644
--- a/src/[main.rs](http://main.rs/)
+++ b/src/[main.rs](http://main.rs/)
@@ -344,13 +344,21 @@ fn download_iter(
     }
     if check.is_none() || title == "P.9" {
         return tup.clone();
-    } else {
-        tup.0 = check.unwrap().attr("href").unwrap().to_string();
-        if !tup.0.contains("http") {
-            tup.0 = "https:".to_string() + &tup.0;
-        }
-        return download_iter(tup);
     }
+
+    // Extract URL from html href tag
+    tup.0 = check.unwrap().attr("href").unwrap().to_string();
+
+    // Check to see if it's a relative link, if so assume it's a parahumans.net story
+    if tup.0.starts_with("/20") {
+        tup.0 = "https://parahumans.net".to_string() + &tup.0;
+    }
+
+    // Ensure it's an https request
+    if !tup.0.contains("http") {
+        tup.0 = "https:".to_string() + &tup.0;
+    }
+    return download_iter(tup);
 }
 fn process_book(book: DownloadedBook) {
     println!("Done downloading {}", book.title);

Nov 26 '22 00:11 william-r-s

rust-wildbow-scraper rust-wildbow-scraper copied to clipboard

Daybreak 1.3 errors

rust-wildbow-scraper
rust-wildbow-scraper copied to clipboard