allitebooks write scrapers for some other websites

Following websites can be scraped

http://bookboon.com

Oct 23 '17 18:10 moghya

Such as?

Oct 23 '17 18:10 cLupus

@cLupus Thanks for showing interest in this project. I hope you visited http://moghya.me/allitebooks and got what we're trying to do here.

You can go through http://bookboon.com and try to wrtite scraper for it.

I'll add many such websites soon. Let me know if you gonna do it. I'll assign this to you :)

Oct 23 '17 18:10 moghya

I got to take a look, on the site, as well as in your repo. Am I correct in understanding that this issue is concerned with creating a scraper that creates a file similar to data.py?

Oct 23 '17 18:10 cLupus

Yes, you're correct. It's just we dump the dictionary in JSON and process that JSON.

Oct 23 '17 18:10 moghya

That does sound interesting. I assume the description should be in english. However, the site does offer some additional languiages, although not all the descriptions have been translated into the different languages. Is there any plan for localization (or at the very least to grab what's there in different languages)?

Oct 23 '17 18:10 cLupus

Honestly I didn't think of it. But as you have rightly raised we have to think about it ? What do you propose ?

Oct 23 '17 18:10 moghya

On closer inspection, it seems that only the site have been translated, and not the titles, or the descriptions, and such it would seem not to add much value (in the first run, anyway).

Oct 23 '17 19:10 cLupus

let's work it for English and we'll come up with solution in near future

Oct 23 '17 19:10 moghya

Another issue, is that http://bookboon.com 'locks' their books behind a dropdown, and do not offer direct links to their books. There are some ways to aliviate it

Download the zip-files, and host them (somewhere) by link.
Do some trickery with cookies that are sent along with the request
Something else?

Oct 23 '17 19:10 cLupus

downloading zip one option but, maybe intercepting the request which downloads the book will solve our problem. Think it this way: scraper won't follow what bookboon, it'll work a step ahead we can workaround and get to know what exactly happens after filling the details and instead of filling the details we can directly send the request to download pdf.

Oct 24 '17 02:10 moghya

Hi there, ladies and gentlemen. What's the status on this issue? @moghya Mind if I hop in? Also, shouldn't the first page be a bit more descriptive? I.E. A huge majority of web pages should have written somewhere in the homepage what it is and what it does, not down in the code.

Let me know what you think!

Oct 24 '17 19:10 EmilLuta

@EmilLuta maybe you can contribute by working on #3.

Oct 24 '17 21:10 moghya

allitebooks allitebooks copied to clipboard

write scrapers for some other websites

allitebooks
allitebooks copied to clipboard