scholar.py
scholar.py copied to clipboard
How to get around being blocked permanently? (Persistent 503 error)
I wrote an automated script using scholar.py (not realizing that Google Scholar has a query limit). Now my program consistently runs into a 503 error even though I've successfully done the captcha in my web browser. I have some questions about this:
- When will the ban usually be lifted?
- I've seen some mention cookies as a solution to this - can anyone tell me the details on how to do this?
Thank you and thanks for making a great API!
My experience is, that the ban is lifted after 1 or 2 days.
For the cookies file, you have to edit line 221 COOKIE_JAR_FILE = '' and specify a cookies file. After you get banned, you can go to scholar with your web browser and fill out the captcha. Then you can export your cookies to this file, and everything works again (most of the time)
It seemed that only a Firefox cookie export worked for me (not Chrome)?
I think a small addition that would make this slightly easier to work with would be to fetch the path from an environment variable, if set. os.environ['SCHOLAR_COOKIE_FILE'] or such. I could supply this change as a PR.
Another change that I'm working on that may help this situation a bit is, instead of changing the setting for the cookie (to get citation eg bibtex) for every bootstrap (two requests for apply_settings?), the url for bibtex can be conjured from existing links returned. Then, I think it might make sense to separate fetching the articles from fetching the citations so that when you want citations it doesn't make a request for every result, you could defer the citation request until a later time.
I'm probably getting a little discursive in the context of this ticket. But, I think the above changes could help avert the 503 problem.
@skyl How far are you with your changes? Actually I have trouble with getting blocked automatically, although i am using a cookie file. Those changes would be so great to have :)
I've hit this problem today...
Has anyone discovered a workable solution?
Hi! I am having the same problem (OS X, 10.9.5):
This command runs fine:
python scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" -ddd
But this command does not:
python scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" --citation bt -ddd
Giving the error:
[ INFO] applying setttings failed: HTTP Error 503: Service Unavailable
This could not be solved even by exporting the cookies file from Firefox, and setting the path in the ScholarConf class. The output gives: [ INFO] loaded cookies file
Does anyone have any idea how to solve this?
You have to set Cookies and User-Agent, because Cookies are associated with the User-Agent that was used to generate it
The following method worked for me:
- Complete captcha in web browser
- Open network tab in Chrome debug menu
- Run a new search in google scholar
- Right click on the network request created and choose "copy -> copy as cURL"
- Paste result and find the portion that is "GSP=..."
- Replace the _COOKIES variable in line 25 of scholarly.py with
_COOKIES = {'GSP': '<cookie value from above step>'}
@jweob can you please explain this a bit? I have tried doing that but again my request is failing. Maybe I'm making a mistake