google-play-scraper icon indicating copy to clipboard operation
google-play-scraper copied to clipboard

Unable to read reviews beyond Page 111 even on Latest Version

Open tarunima opened this issue 6 years ago • 9 comments

  • Operating System: Mac OS Mojave v. 10.14
  • Node version: 6.8
  • google-play-scraper version: 6.2.3

Description:

The scraper is unable to read reviews beyond page 111. No error. Just a blank object. I can however visually scroll down on the app page on play store to see comments older than those on page 111.

This doesn't seem to be a throttling issue since even if I pass only one page as argument, if page > 111, no object is returned by the API. Based off what is suggested in #248 I ensured the package is in latest version but that too doesn't fix the problem.
This is also not a language issue since the reviews on the page are in English. Is there a way I can fix this?

Example code:

console.log(args[0])

gplay.reviews({
    appId: 'indwin.c3.shareapp',
    page: args[0],    
    sort: gplay.sort.NEWEST
}).then((obj)=>{
    console.log(obj)

Output on Console:

[]

tarunima avatar Feb 20 '19 01:02 tarunima

Same issue. Unable to get data after Page 111 for any app. Page 112 returns Empty I am facing the same issue too . If anyone has a solution , please guide us..

kani-01 avatar Mar 05 '19 21:03 kani-01

Hi Kani, I eventually used selenium to auto scroll down to the end of the page. And then scraped the saved html file. It isn't ideal but it gets me most of what I wanted from the project.

tarunima avatar Mar 06 '19 06:03 tarunima

Hi tarunima,

I tried scraping using selenium too. But I did not know what is the limit allowed there. If possible, can you share the number of reviews per app you could get using selenium. Also did you use any IP rotation?

kani-01 avatar Mar 06 '19 18:03 kani-01

Its a part of a larger project for me so I'll share the code on github when its complete. In the meantime, this is what I am doing:

  1. Use selenium to scroll at least four times till the show more button shows up. Click the show more button. Go back to scrolling.
  2. When the end of reviews is reached, the code execution stops either because of an error or because of end of loop.
  3. Save html file locally.
  4. Use python to scrape comments and other data using tags as identifiers. The limitation is that unlike the API, text for long reviews hidden under the 'show full review' won't be captured. But in my use case such reviews are few. When I want to read such a review, I find the review in the saved html file and move from there.

This is super hacky but it is getting my work done for now. Here's the code snipped for part 1 if it is useful. Note: I have changed it a little bit for my work but this does work at least for a few cases:


driver = webdriver.Firefox()
driver.get("url")
wait = WebDriverWait(driver, 10)
SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

for i in range(0,1000):
    # The following code snippet works:
    for i in range(0,5):
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
        print('in inner loop')
        
    print('out of inner loop')
    try:
        element = driver.find_element_by_xpath("/html/body/div[1]/div[4]/c-wiz/div/div[2]/div/div[1]/div/div/div[1]/div[2]/div[2]/div").click();
    except NoSuchElementException as exception: 
        print("'Show More' not found. Continuing to scroll.")
        continue

tarunima avatar Mar 08 '19 15:03 tarunima

Thank you so much Tarunima

kani-01 avatar Mar 08 '19 20:03 kani-01

Hi tarunima , if a application have so many reviews, such as more than 100000,the html file is so large.
How did you deal with this situation?

xuanhui0129 avatar Jun 22 '19 12:06 xuanhui0129

Hi, I did hit an issue with that. So eventually changed the methodology to save to json as it scrolls. I still had to leave the code running for a day to get reviews from apps with over 40k reviews, just because scrolling takes that long. It isn't ideal at all, but I managed to get data of interest (from multiple apps) within 2 weeks.

tarunima avatar Jun 23 '19 14:06 tarunima

See if this helps you: https://github.com/tarunima/ScrapePlayStoreReviews

tarunima avatar Jun 23 '19 15:06 tarunima

@tarunima

console.log(args[0])

gplay.reviews({ appId: 'indwin.c3.shareapp', page: args[0],
sort: gplay.sort.NEWEST }).then((obj)=>{ console.log(obj)

is there page parameter?

supersoftno1 avatar Nov 05 '19 03:11 supersoftno1