GoBooDo
GoBooDo copied to clipboard
Received invalid response
When i try this book: https://www.google.co.in/books/edition/Xamidea_Social_Science_for_Class_9_CBSE/94s2EAAAQBAJ?hl=en&gbpv=0
it says Received invalid response
i use the id: 94s2EAAAQBAJ
It detects the name of the book but does not do anything
please fix this
thank you
try replaceing lines:
try: stringResponse = ("["+scripts[6].text.split("_OC_Run")[1][1:-2]+"]") except: stringResponse = ("["+scripts[-4].text.split("_OC_Run")[1][1:-2]+"]")
with:
target = "_OC_Run" index = [i for i, content in enumerate(scripts) if '_OC_Run' in str(content)] index = index[0] stringResponse = f"[{str(scripts[index]).split('_OC_Run')[1][1:].strip(');</script>')}]"
It worked for me.
thanks! but now im getting
------------------- Creating PDF -------------------
Traceback (most recent call last):
File "GoBooDo2.py", line 217, in <module>
book.start()
File "GoBooDo2.py", line 197, in start
self.processBook()
File "GoBooDo2.py", line 150, in processBook
service.makePdf()
File "F:\GoBooDo-master\makePDF.py", line 15, in makePdf
firstPath = self.imageNameList[0]
IndexError: list index out of range
is it related?
book id: PtkMiNeajNMC
I don't know for certain, but it doesn't look related to me. I tried your book and got links for all but one page. (PT174) Unfortunately, it looks like my install of tesseract isn't detecting missing pages correctly, and only 40 out of 178 were actually fetched. As for your case, it looks to me that you can't fetch any pages. Check if you have any images in "book"\images and maybe try adding proxies to the proxies.txt I put over 140 on my list. Just go to the web and get sites with lots of free proxy IPs
Also, you should open a separate Issue.
@mrelg For the tesseract problem, you might want to try:
diff --git a/storeImages.py b/storeImages.py
--- a/storeImages.py (revision 94bd40aa323abc30d88bcda81afc9cd28b0e94c4)
+++ b/storeImages.py (date 1634742102818)
@@ -65,7 +65,7 @@
except:
pytesseract.pytesseract.tesseract_cmd = self.tesserPath
text = pytesseract.image_to_string(bw)
- return text.replace('\n', " ") == 'image not available'
+ return text.strip().replace('\n', " ") == 'image not available'
def getImages(self,retries):
to use proxies I put "proxy_links": 1, in settings. Is this correct?
try replaceing lines:
try: stringResponse = ("["+scripts[6].text.split("_OC_Run")[1][1:-2]+"]") except: stringResponse = ("["+scripts[-4].text.split("_OC_Run")[1][1:-2]+"]")
with:
target = "_OC_Run" index = [i for i, content in enumerate(scripts) if '_OC_Run' in str(content)] index = index[0] stringResponse = f"[{str(scripts[index]).split('_OC_Run')[1][1:].strip(');</script>')}]"
It worked for me.
Works for me too
@mrelg Could you please make the pull request so the code get into the codebase?
hello everyone,
I tried the solution in the photo below but after replacing the lines of code as described
the error appears in the next photo
could somebody help me or explain me what kind of error is that?
I'm a python rookie, so believe me when I say it, it is quite a rookie mistake. Open up your code in some editor that can visualize white space characters like notpad++ (there is an option for that under View/ShowSymbols), and make sure to match the number and type of indentations in the code. https://stackoverflow.com/questions/1016814/what-to-do-with-unexpected-indent-in-python
BTW can someone please implement my fix in a pull request? I don't have the time to learn how to do a proper GitHub thing and do it myself. It's a little tiring to answer everyone how to type it in themself, especially when the bug isn't necessarily the same one.
the fix offered by @mrelg is still needed to make this project work.
Here is the patch version of the same as https://github.com/vaibhavk97/GoBooDo/issues/60#issuecomment-918563558
diff --git a/GoBooDo.py b/GoBooDo.py
index 0971a7d..2dab8f3 100644
--- a/GoBooDo.py
+++ b/GoBooDo.py
@@ -82,10 +82,11 @@ class GoBooDo:
print(f'Downloading {self.name[:-15]}')
if self.found == False:
scripts = (soup.findAll('script'))
- try:
- stringResponse = ("["+scripts[6].text.split("_OC_Run")[1][1:-2]+"]")
- except:
- stringResponse = ("["+scripts[-4].text.split("_OC_Run")[1][1:-2]+"]")
+ target = "_OC_Run"
+ index = [i for i, content in enumerate(scripts) if '_OC_Run' in str(content)]
+ index = index[0]
+ stringResponse = f"[{str(scripts[index]).split('_OC_Run')[1][1:].strip(');</script>')}]"
+
jsonResponse = json.loads(stringResponse)
self.createPageDict(jsonResponse)
print(f'Pages to be fetched in the current iteration are : {len(self.pageList)}')
I also attached it as a file in case copy-n-paste fails to work:
save the file and then run:
git clone https://github.com/vaibhavk97/GoBooDo
cd GoBooDo
# now copy patch.txt to here and finally
git apply patch.txt
python GoBooDo.py --id=YOURID