goodreads icon indicating copy to clipboard operation
goodreads copied to clipboard

How were the users chosen?

Open Santosh-Gupta opened this issue 4 years ago • 2 comments

I see that there are 876,145 total users in the dataset, but goodreads has 90 million users (as of july 2019). I was wondering how were those 876,145 users selected. Was there a minimum number of ratings?

Santosh-Gupta avatar Dec 02 '20 22:12 Santosh-Gupta

Hi Santosh, the users in these dataset are those who in the top 1000 book clubs (https://www.goodreads.com/group) back to early 2017 & chose to public their book shelves - so they are just a subset of the Goodreads community.

MengtingWan avatar Dec 17 '20 21:12 MengtingWan

Are there any plans for an entire goodreads user review dataset?

I started a script here, but it needs some work

https://colab.research.google.com/drive/1uOyVlKaT4QFtce9yQpKj9hRtj5z8Uyta

It downloads reviews directly from rss feeds, so it goes pretty fast. It still needs work in confirming it has gotten all the books from a user (I think there might be timeouts) and issues with books that have several versions/editions.

Santosh-Gupta avatar Aug 07 '21 07:08 Santosh-Gupta