google-play-scraper
google-play-scraper copied to clipboard
Unable to read reviews beyond Page 111 even on Latest Version
- Operating System: Mac OS Mojave v. 10.14
- Node version: 6.8
- google-play-scraper version: 6.2.3
Description:
The scraper is unable to read reviews beyond page 111. No error. Just a blank object. I can however visually scroll down on the app page on play store to see comments older than those on page 111.
This doesn't seem to be a throttling issue since even if I pass only one page as argument, if page > 111, no object is returned by the API.
Based off what is suggested in #248 I ensured the package is in latest version but that too doesn't fix the problem.
This is also not a language issue since the reviews on the page are in English.
Is there a way I can fix this?
Example code:
console.log(args[0])
gplay.reviews({
appId: 'indwin.c3.shareapp',
page: args[0],
sort: gplay.sort.NEWEST
}).then((obj)=>{
console.log(obj)
Output on Console:
[]
Same issue. Unable to get data after Page 111 for any app. Page 112 returns Empty I am facing the same issue too . If anyone has a solution , please guide us..
Hi Kani, I eventually used selenium to auto scroll down to the end of the page. And then scraped the saved html file. It isn't ideal but it gets me most of what I wanted from the project.
Hi tarunima,
I tried scraping using selenium too. But I did not know what is the limit allowed there. If possible, can you share the number of reviews per app you could get using selenium. Also did you use any IP rotation?
Its a part of a larger project for me so I'll share the code on github when its complete. In the meantime, this is what I am doing:
- Use selenium to scroll at least four times till the show more button shows up. Click the show more button. Go back to scrolling.
- When the end of reviews is reached, the code execution stops either because of an error or because of end of loop.
- Save html file locally.
- Use python to scrape comments and other data using tags as identifiers. The limitation is that unlike the API, text for long reviews hidden under the 'show full review' won't be captured. But in my use case such reviews are few. When I want to read such a review, I find the review in the saved html file and move from there.
This is super hacky but it is getting my work done for now. Here's the code snipped for part 1 if it is useful. Note: I have changed it a little bit for my work but this does work at least for a few cases:
driver = webdriver.Firefox()
driver.get("url")
wait = WebDriverWait(driver, 10)
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
for i in range(0,1000):
# The following code snippet works:
for i in range(0,5):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
print('in inner loop')
print('out of inner loop')
try:
element = driver.find_element_by_xpath("/html/body/div[1]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div").click();
except NoSuchElementException as exception:
print("'Show More' not found. Continuing to scroll.")
continue
Thank you so much Tarunima
Hi tarunima , if a application have so many reviews, such as more than 100000,the html file is so large.
How did you deal with this situation?
Hi, I did hit an issue with that. So eventually changed the methodology to save to json as it scrolls. I still had to leave the code running for a day to get reviews from apps with over 40k reviews, just because scrolling takes that long. It isn't ideal at all, but I managed to get data of interest (from multiple apps) within 2 weeks.
See if this helps you: https://github.com/tarunima/ScrapePlayStoreReviews
@tarunima
console.log(args[0])
gplay.reviews({
appId: 'indwin.c3.shareapp',
page: args[0],
sort: gplay.sort.NEWEST
}).then((obj)=>{
console.log(obj)
is there page parameter?