GoodreadsScraper issues

CSS extractors can get out of sync with the Goodreads site

The CSS extractors in [`book_spider`](https://github.com/havanagrawal/GoodreadsScraper/blob/master/GoodreadsScraper/spiders/book_spider.py#L21-L42) etc can get out of sync, and the only way to detect this is with a trial run Solution: Add a unit test that retrieves...

havanagrawal

The data parsing step adds spurious values

If the `dateutil.parse` function cannot find a component of the timestamp (any of day, month or year), it replaces it with the *current* date's components. This can cause problems in...

havanagrawal

Add cover image to book .jl files

Thought it would be nice to embed cover image to the jl files so people can use them for different purposes

ralsuwaidi

Add single author spider and crawl command

This PR adds a method to `crawl.py` to scrape the books of a single author. Example: `python -m crawl single-author --author_id 19520462.Arlan_Hamilton` It is based on the scraper for a...

jd7h

Scrape genre pages

Hey, have you tried adding a crawler for scraping the genre pages of goodreads, like: https://www.goodreads.com/shelf/show/war?page=1, I tried it, but it always goes for page 1 only. Even if I...

kardeepakkumar

Data incomplete

I just crawled my to-read list and only in ~20% of the cases I got all the info for the books. The rest is just links. Such as [this one](https://www.goodreads.com/book/show/40376072-children-of-ruin),...

jarekkopec

book.jl data is inconsistent with URLs

1

Hey, I'm having issues similar to Issue #17 where the .jl book file seems to be sporadically displaying some results with only the URL while others are complete with all...

freudiandrip

GoodreadsScraper
GoodreadsScraper copied to clipboard

Metadata

CSS extractors can get out of sync with the Goodreads site

The data parsing step adds spurious values

Add cover image to book .jl files

Add single author spider and crawl command

Scrape genre pages

Data incomplete

book.jl data is inconsistent with URLs

← Metadata

Owner

Metadata

GoodreadsScraper GoodreadsScraper copied to clipboard

Metadata

CSS extractors can get out of sync with the Goodreads site

The data parsing step adds spurious values

Add cover image to book .jl files

Add single author spider and crawl command

Scrape genre pages

Data incomplete

book.jl data is inconsistent with URLs

← Metadata

Owner

Metadata

GoodreadsScraper
GoodreadsScraper copied to clipboard