instascrape
instascrape copied to clipboard
get_recent_posts() raises MissingCookieWarning but we can't pass a valid cookie
Describe the bug The get_recent_posts() method raises MissingCookieWarning, but we can't pass a valid cookie header to avoid that
To Reproduce
from instascrape import *
instagram_sessionid = "xxx"
headers = {"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57",
"cookie": f"sessionid={instagram_sessionid};"}
profile = Profile('https://www.instagram.com/google/')
profile.scrape(headers=headers)
print(profile.posts)
recents = profile.get_recent_posts() #We should pass a cookie here
The code is executed correctly but we get a MissingCookiesWarning: Request header does not contain cookies! It's recommended you pass at least a valid sessionid otherwise Instagram will likely redirect you to their login page.
warning
If I try to pass a header cookie:
from instascrape import *
instagram_sessionid = "xxx"
headers = {"user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57",
"cookie": f"sessionid={instagram_sessionid};"}
profile = Profile('https://www.instagram.com/google/')
profile.scrape(headers=headers)
print(profile.posts)
recents = profile.get_recent_posts(headers=headers) #This time I try to pass an header cookie
I get a TypeError: get_recent_posts() got an unexpected keyword argument 'headers'
Expected behavior We should be able to pass a valid cookie to avoid the warning or the warning should not be triggered altogether.
Have the same issue!
I fixed it by passing cookies to Selenium before going to the profile. I do this by exporting the cookies from instagram with the chrome extension Cookie-Editor. And then just copy paste it to cookies.json
url = f"https://www.instagram.com/{handle}/"
driver.get(url) # Needed to fake a login
# Fake login with Cookies
with open("./cookies.json", "r", newline="") as data: # Open cookies.json
cookies = json.load(data)
for cookie in cookies: # Add cookies to driver
cookie.pop("sameSite") # Selenium breaks with sameSite
driver.add_cookie(cookie) # Add our authorized cookies
ig_profile = Profile(url) # Set IG profile
ig_profile.url = url
ig_profile.scrape(headers=headers) # Scrape IG profile
Any way around it so far without selenium?
I get the same error and posted about it at https://github.com/chris-greening/instascrape/issues/89#issuecomment-801495835
I fixed it by passing cookies to Selenium before going to the profile. I do this by exporting the cookies from instagram with the chrome extension Cookie-Editor. And then just copy paste it to cookies.json
url = f"https://www.instagram.com/{handle}/" driver.get(url) # Needed to fake a login # Fake login with Cookies with open("./cookies.json", "r", newline="") as data: # Open cookies.json cookies = json.load(data) for cookie in cookies: # Add cookies to driver cookie.pop("sameSite") # Selenium breaks with sameSite driver.add_cookie(cookie) # Add our authorized cookies ig_profile = Profile(url) # Set IG profile ig_profile.url = url ig_profile.scrape(headers=headers) # Scrape IG profile
@Xerrion
I spent a lot of time trying this. Not sure what cookie.pop("sameSite")
is doing since I don't see any sameSite keys if I call print(driver.get_cookies())
, so I skipped all that and just ran driver.add_cookie({'name':'sessionid','value':os.environ['INSTAGRAM_SESSIONID']})
which just resulted in the same MissingCookiesWarning. :-(
UPDATE:
So I'm trying this again. I understand your comment now for sameSite. I'm still getting the MissingCookiesWarning though. If you're updating the driver, but not passing it to the scrape method, how is updating the driver impacting instascrape if you don't pass it to instascrape???
I've been combing through the code. Looks like you have to pass your driver to the scrape method as well. I mention it here https://github.com/chris-greening/instascrape/issues/89#issuecomment-805394041 but I'm still getting the same error even with the driver passed to scrape, which is very weird if you read the code.
Just noting that this issue has made it pretty much impossible for me to use instascrape for my use case. Due to this issue and https://github.com/chris-greening/instascrape/issues/89 at this point I've abandoned instacrape.
get_recent_post() always returns 24 post no matter the amount, can I bypass that? like get all the post?
I tried the same thing as he did (adding the cookie manually) but still I'm getting the warning. Like what am I doing wrong? Here's the code I am using:
SESSION_ID = 'my session id'
url = f"https://www.instagram.com/discordbot98/"
webdriver.get(url)
time.sleep(10)
webdriver.add_cookie({'name': 'sessionid', 'value': SESSION_ID})