instascrape icon indicating copy to clipboard operation
instascrape copied to clipboard

Hashtag Scraper KeyError: 'graphql' when using Selenium webdriver or Sessionid Cookie

Open kalebm1 opened this issue 2 years ago • 2 comments

Describe the bug I am trying to scrape posts from a hashtag. I am have used the both the Selenium driver and headers with a sessionid way of getting around the Instagram redirect to login page error. Before Instagram was redirecting to the login page, I was able to successfully scrape the hashtag with no problem. Once the redirection occurred, I inputted my sessionid into the headers field and got the following error: post_arr = self.json_dict["entry_data"]["TagPage"][0]["graphql"]["hashtag"]["edge_hashtag_to_media"]["edges"] KeyError: 'graphql'. I am fairly new to the library, so I decided to poke around in the code a bit and read through similar issues. After poking around, I think this error is similar to #124 in the sense that the json_dicts are not structured the same. I printed the json_dict out to a file and found that there is no graphql available nor are there many of the other things that the get_recent_posts looks for. I hope the fix for this error is as simple as the other issue.

To Reproduce Steps to reproduce the behavior:

def __init__(self):
    self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
             Chrome/79.0.3945.74 Safari/537.36 Edg/79.0.309.43",
      }
    self.hashtag = Hashtag(hashtagUrl)
    self.hashtag.scrape(headers=self.headers)
    self.hashtags = self.hashtag.get_recent_posts()

Expected behavior The expected outcome is a List[Posts] as what should typically be returned when calling the hashtag.get_recent_posts() method.

Screenshots Screenshot (313)

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Chrome
  • Version: 91.0.4472.124

kalebm1 avatar Jul 20 '21 05:07 kalebm1

I'm having exactly the same problem, and I'm also sending the SessionID in cookies if anyone say it might be the problem... Still trying to understand what could be causing this issue

havelar avatar Sep 16 '21 11:09 havelar

I have the same issue when I search using proxy and sessionid. I think the problem is defining the sessionid, that's why missing data is coming. And the library gives error but I couldn't find how to solve it.

yemregundogmus avatar Sep 19 '21 21:09 yemregundogmus