perlego-downloader icon indicating copy to clipboard operation
perlego-downloader copied to clipboard

{'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 18}}

Open Jhaultch opened this issue 1 year ago • 75 comments

when I run the code by command below: #python3 ./downloader.py it shows me: {'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 18}}

code line 18 is AUTH_TOKEN,I copy the AUTH_TOKEN right. I`m confused about this error. please help me out ,many thks :0

Jhaultch avatar Apr 29 '23 04:04 Jhaultch

Same issue how to resolve.

MajidAmin112 avatar Apr 29 '23 04:04 MajidAmin112

Yesterday I could not download, although Authentication token and Recapcha was corrected. May the security of website has been changed.

romantci2710 avatar Apr 29 '23 06:04 romantci2710

Yes the code no work anymore

DKZPT avatar Apr 29 '23 15:04 DKZPT

Is it such a coincidence? I just finished the PIL module problem. so pitiful am I :(

Jhaultch avatar Apr 29 '23 18:04 Jhaultch

I'm also having the same error code as well :( Any solutions or fixes available?

TIA! @evmer @jajosheni @owohai

lyyangyy avatar May 01 '23 06:05 lyyangyy

Same issue here :-/ any solution?

leookny avatar May 01 '23 15:05 leookny

Same here! Thank you for looking at it!

mathrud avatar May 03 '23 14:05 mathrud

i'm having this issue a well btw.

usure avatar May 04 '23 00:05 usure

@evmer please fix it.

jaan143 avatar May 04 '23 06:05 jaan143

@jajosheni @owohai @evmer please Dears can you fix this because its need to update i think.

jaan143 avatar May 04 '23 11:05 jaan143

Just an update. It seems like the issue is a ReCaptcha problem, since now I get this error: {'event': 'error', 'data': {'message': 'Failed to validate recaptcha token', 'code': 6}}

Dunno how helpful this is. I've double-checked and I'm sure the ReCaptcha token I used was accurate, so I'm not really sure what the issue is.

usure avatar May 05 '23 00:05 usure

Just an update. It seems like the issue is a ReCaptcha problem, since now I get this error: {'event': 'error', 'data': {'message': 'Failed to validate recaptcha token', 'code': 6}}

Dunno how helpful this is. I've double-checked and I'm sure the ReCaptcha token I used was accurate, so I'm not really sure what the issue is.

The error keep been {'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 18}}

DKZPT avatar May 05 '23 07:05 DKZPT

Same problem @evmer... do you know if there is a fix, thanks!

Latingking avatar May 05 '23 17:05 Latingking

same problem

lorenzolangone avatar May 05 '23 19:05 lorenzolangone

Same problem.....waiting for a fix......

kalicutit avatar May 07 '23 07:05 kalicutit

Same problem here.

cathanaso avatar May 07 '23 21:05 cathanaso

Same here

NukosX avatar May 08 '23 17:05 NukosX

same issue here... any ideas? I was unable to find any information on that error number '18'...

Paperboat747 avatar May 09 '23 14:05 Paperboat747

if you run python3 in verbose mode:

python3 -v downloader.py

..you'll see that when it reaches that error, it appears to be due to a problem with the plugin of PyPDF2:

import 'PyPDF2._version' # <_frozen_importlib_external.SourceFileLoader object at 0x10cafe0d0>

# /usr/local/lib/python3.11/site-packages/PyPDF2/__pycache__/papersizes.cpython-311.pyc matches /usr/local/lib/python3.11/site-packages/PyPDF2/papersizes.py

# code object from '/usr/local/lib/python3.11/site-packages/PyPDF2/__pycache__/papersizes.cpython-311.pyc'

import 'PyPDF2.papersizes' # <_frozen_importlib_external.SourceFileLoader object at 0x10cafe1d0>

import 'PyPDF2' # <_frozen_importlib_external.SourceFileLoader object at 0x10c5b5d10>
{'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 18}}

I'm not handy with Python, but it appears to me that we'd need to swap it with an updated PDF module [or a different one] for Python? Looking through the docs on PyPDF2, I think it's breaking because perlego adjusted some of their responsive code for different devices ; but that's a speculation.

jasoncbraatz avatar May 11 '23 18:05 jasoncbraatz

I have found out that the problem is caused when the script tries to load book pages: specifically when this line or similiar ones are called ws.send(json.dumps({"action":"loadPage","data":{"authToken": AUTH_TOKEN, "pageId": str(next_page), "bookType": book_format, "windowWidth":1792, "mergedChapterPartIndex":merged_chapter_part_idx}})). The action of loading page i think is changed on the server, because until the authentication everything works but I think they changed the info we need to send or they crypted, because I have analized book-delivery packet with devoloper options on chrome and I saw that the request sends ({"action":"loadPage","data":{some long string}}, I think it's in base64 but I can't decode. Actually it sends three diffrent request same to the last one that all starts with the same 9 characters, and based on the response one is for fonts, one is the content, and one I didn't understand. So we need to figure out what they want and we can get back this great script, but I'm a noob so I don't know how to decrypt the string or what they need

mattiapergola avatar May 16 '23 22:05 mattiapergola

Ditto... {'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 18}}

muttmutt avatar May 18 '23 15:05 muttmutt

I have found out that the problem is caused when the script tries to load book pages: specifically when this line or similiar ones are called ws.send(json.dumps({"action":"loadPage","data":{"authToken": AUTH_TOKEN, "pageId": str(next_page), "bookType": book_format, "windowWidth":1792, "mergedChapterPartIndex":merged_chapter_part_idx}})). The action of loading page i think is changed on the server, because until the authentication everything works but I think they changed the info we need to send or they crypted, because I have analized book-delivery packet with devoloper options on chrome and I saw that the request sends ({"action":"loadPage","data":{some long string}}, I think it's in base64 but I can't decode. Actually it sends three diffrent request same to the last one that all starts with the same 9 characters, and based on the response one is for fonts, one is the content, and one I didn't understand. So we need to figure out what they want and we can get back this great script, but I'm a noob so I don't know how to decrypt the string or what they need

OK I think it's impossible to decrypt the data cause is crypted with CryptoJS3 using EvpKDF so we need the cipher and the salt to decrypt but I think it's server-side so it's impossible. But I'm a super noob so if anyone finds something else or knows how to retrive the cipher would be great cause this script was very useful, and saved me a lot of time compared to my download script

mattiapergola avatar May 20 '23 11:05 mattiapergola

I looked at it some more and I really don't think that's what it is, and it wouldn't be impossible because the browser has to render it [eventually] ; it'd be only impossible if we couldn't see it via Chrome etc. and we were trying to somehow intercept a transmission. Instead, the script looks to "piggy back" on an established connection and simply automate something that could be done by hand with copy & paste - it really doesn't do anything more than that. Being that that technique is tedious, it was hopeful that this script could eliminate some of the time burden. Another approach would be to use a browser-automation tool to do it, but therein lies a lot of work to set that up and sharing it isn't as convenient as it is with Python or another script.

Rerun this in -v or verbose mode with python and trace it back and you'll see that it gets stuck when this calls upon PyPDF to render the window (the page width 1792, for example, I don't know where the original dev of this script came up with that). It appears to me like it's breaking in the presentation layer, not the data layer, because if you modify the script to simply get the output without doing anything, it's capable of connecting to that websocket (it is for me at least). So there's something that's changed with Perlego's presentation layer, likely some sort of responsiveness upgrade - that appears to have broken this.

I have a "theory" on what could be a workaround/another way to do this but I can't test it since I've let my Perlego account lapse. Perlego claims that it's "app" allows for many of it's books to be downloaded offline in PDF form. Has anyone tried an iPhone or Android emulator (or even a real phone or device), downloaded a book, and then tried to just use a standard DMCA removal tool (like Epubor Ultimate) to get them that way? It'd involve downloading the file, copying it (easier to do from an Android) to a computer, and then removing the DMCA from that file. This seems like this would work; I don't believe they'd be using a non-standard PDF encryption since the whole point of PDF is it's ability to do that.. otherwise, the would have come up with their own bizarre proprietary format (like Amazon and it's AZW3 or somesuch)

jasoncbraatz avatar May 20 '23 20:05 jasoncbraatz

I looked at it some more and I really don't think that's what it is, and it wouldn't be impossible because the browser has to render it [eventually] ; it'd be only impossible if we couldn't see it via Chrome etc. and we were trying to somehow intercept a transmission. Instead, the script looks to "piggy back" on an established connection and simply automate something that could be done by hand with copy & paste - it really doesn't do anything more than that. Being that that technique is tedious, it was hopeful that this script could eliminate some of the time burden. Another approach would be to use a browser-automation tool to do it, but therein lies a lot of work to set that up and sharing it isn't as convenient as it is with Python or another script.

Rerun this in -v or verbose mode with python and trace it back and you'll see that it gets stuck when this calls upon PyPDF to render the window (the page width 1792, for example, I don't know where the original dev of this script came up with that). It appears to me like it's breaking in the presentation layer, not the data layer, because if you modify the script to simply get the output without doing anything, it's capable of connecting to that websocket (it is for me at least). So there's something that's changed with Perlego's presentation layer, likely some sort of responsiveness upgrade - that appears to have broken this.

I have a "theory" on what could be a workaround/another way to do this but I can't test it since I've let my Perlego account lapse. Perlego claims that it's "app" allows for many of it's books to be downloaded offline in PDF form. Has anyone tried an iPhone or Android emulator (or even a real phone or device), downloaded a book, and then tried to just use a standard DMCA removal tool (like Epubor Ultimate) to get them that way? It'd involve downloading the file, copying it (easier to do from an Android) to a computer, and then removing the DMCA from that file. This seems like this would work; I don't believe they'd be using a non-standard PDF encryption since the whole point of PDF is it's ability to do that.. otherwise, the would have come up with their own bizarre proprietary format (like Amazon and it's AZW3 or somesuch)

No as I said and also you can verify if you are able to create a perlego account using chrome developer option now to render the page u need to send a request where the data is encrypted as I said, before as it's written in the source code of the script we had to send the Authtoken the pageid and so on, if you want to confirm that just look at the video explanation that is mentioned in the readme of this project, u can see in the chrome devoloper option under network tab that the request loadpage is changed from "action":"loadPage","data":{"authToken": AUTH_TOKEN, "pageId": 1, "bookType": pdf, "windowWidth":1792, "mergedChapterPartIndex": 1as showed in the video to "action":"loadPage","data":{"long string" } as you can see on your chrome, also if you run one by one the commands of the script you can see that when you use the command ws.send(json.dumps({"action":"loadPage","data":{"authToken": AUTH_TOKEN, "pageId": str(next_page), "bookType": book_format, "windowWidth":1792, "mergedChapterPartIndex":merged_chapter_part_idx}})) the response is the same as the issue here, so definitely it's this the problem. The error of pypdf maybe is caused from the fact that it was deprecated a month ago, but it's Very easy to fix just switch to the new library. I've also tried to use the mobile app but for pdf u download images and htmls that are encrypted and I wasn't able to decrypt, I also analyzed the traffic and also there they encrypted the request. As I said I created my own script to download pages using selenium, but it takes like 10/15 minutes for a 400 pages book and the pdf creation it's sometimes broken but I'm trying to fix it. I don't think we will be able to get the script back as it was.

mattiapergola avatar May 21 '23 02:05 mattiapergola

@mattiapergola mind to share the selenium script? ty

gdassori avatar May 25 '23 11:05 gdassori

@mattiapergola actually you're onto something, as I think either way selenium is really the only way to solve it long-term - alternatively, we could take an existing curl library from another language and try to do the piggy-back technique in a different way, perhaps by letting chrome login first and using the same info but this time dump the entire page out and parse between what we know to be book text (probably looking for a CSS tag) and then crawl to the next page by emulating a next page action. This is up on the scale of doing a web crawler, which is a complex task, and while it would be potentially more robust, it certainly take a lot of resources to put together.

I agree, the script the way it was won't work, I've tried myself upgrading the PyPDF2 and manipulating it's code to try to get around the problem (though it's curious that it broke right around the exact time the old PyPDF library was deprecated). Consider me an "in" to help in anyway.. my language of main expertise is C (in embedded environments especially) but it's not like the other languages are that difficult to pick up..

I'll also keep digging too.

jasoncbraatz avatar May 25 '23 16:05 jasoncbraatz

sometime, would love to get the take by @evmer - the originator of the code ; seeing the above, any other ideas that haven't been explored thus far?

jasoncbraatz avatar May 25 '23 16:05 jasoncbraatz

@mattiapergola actually you're onto something, as I think either way selenium is really the only way to solve it long-term - alternatively, we could take an existing curl library from another language and try to do the piggy-back technique in a different way, perhaps by letting chrome login first and using the same info but this time dump the entire page out and parse between what we know to be book text (probably looking for a CSS tag) and then crawl to the next page by emulating a next page action. This is up on the scale of doing a web crawler, which is a complex task, and while it would be potentially more robust, it certainly take a lot of resources to put together.

I agree, the script the way it was won't work, I've tried myself upgrading the PyPDF2 and manipulating it's code to try to get around the problem (though it's curious that it broke right around the exact time the old PyPDF library was deprecated). Consider me an "in" to help in anyway.. my language of main expertise is C (in embedded environments especially) but it's not like the other languages are that difficult to pick up..

I'll also keep digging too.

@jasoncbraatz thanks for your reply, your idea was what I was working for, in fact as I said the login step works as always the only thing changed is the request data we send to get the page info. I am super noob in scripting, so I don't know how to use the websocket to login and then scrape the html of the pages of the book, my script does the login + scrape through selenium. Right know I am studying for the university exams so I don't have much time to work on the code neither the mind to make it right. My code I think can be improved a lot and much more clear. My problem as I mentioned in the last post is the pdf generating part because at first I scrape and generate an html file for every single page and if you open it it's perfect, but in the pdf generating part sometimes it generates it alongside another blank page or splits the contents in two different pages. And the last time I've downloaded a book that had like two or three pages that were instead images not actual text, and I don't know why my html file was not showing, maybe is because the image URL are crypted by a base64 string but if you decode it, it shows some XML text with the image source link inside, so I just downloaded manually the images and inserted in the pdf file, but I was thinking to implement the image source scraping into the code, but as I said at the moment I don't have so much time to play with the code. Maybe I will share the code in few days just I want to clear it a little bit and delete my personal info inside, as the password and email of your account are necessary for the login part of the code. Hope the original developer of this script will reply and maybe find a solution. For the moment I wish all of the readers a great day.

mattiapergola avatar May 25 '23 16:05 mattiapergola

@Oredna I'm happy to help, did you end up using a different library?

jasoncbraatz avatar May 29 '23 14:05 jasoncbraatz

no fix yet? :(

I’m sorry to say this, but you must stop spamming on the project progression immediately. This is a community driven project. We are all always happy to help, but not with this kind of such forceful exert.

NukosX avatar Jun 06 '23 10:06 NukosX