scholar.py icon indicating copy to clipboard operation
scholar.py copied to clipboard

How to get around being blocked permanently? (Persistent 503 error)

Open kathleenkusworo opened this issue 10 years ago • 9 comments

I wrote an automated script using scholar.py (not realizing that Google Scholar has a query limit). Now my program consistently runs into a 503 error even though I've successfully done the captcha in my web browser. I have some questions about this:

  1. When will the ban usually be lifted?
  2. I've seen some mention cookies as a solution to this - can anyone tell me the details on how to do this?

Thank you and thanks for making a great API!

kathleenkusworo avatar Sep 06 '15 18:09 kathleenkusworo

My experience is, that the ban is lifted after 1 or 2 days.

For the cookies file, you have to edit line 221 COOKIE_JAR_FILE = '' and specify a cookies file. After you get banned, you can go to scholar with your web browser and fill out the captcha. Then you can export your cookies to this file, and everything works again (most of the time)

eknoes avatar Oct 21 '15 11:10 eknoes

It seemed that only a Firefox cookie export worked for me (not Chrome)?

I think a small addition that would make this slightly easier to work with would be to fetch the path from an environment variable, if set. os.environ['SCHOLAR_COOKIE_FILE'] or such. I could supply this change as a PR.

Another change that I'm working on that may help this situation a bit is, instead of changing the setting for the cookie (to get citation eg bibtex) for every bootstrap (two requests for apply_settings?), the url for bibtex can be conjured from existing links returned. Then, I think it might make sense to separate fetching the articles from fetching the citations so that when you want citations it doesn't make a request for every result, you could defer the citation request until a later time.

I'm probably getting a little discursive in the context of this ticket. But, I think the above changes could help avert the 503 problem.

skyl avatar Nov 16 '15 22:11 skyl

@skyl How far are you with your changes? Actually I have trouble with getting blocked automatically, although i am using a cookie file. Those changes would be so great to have :)

dmuiX avatar Jun 10 '16 19:06 dmuiX

I've hit this problem today...

vext01 avatar Jul 13 '16 13:07 vext01

Has anyone discovered a workable solution?

egavves avatar Aug 03 '16 08:08 egavves

Hi! I am having the same problem (OS X, 10.9.5):

This command runs fine: python scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" -ddd But this command does not: python scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" --citation bt -ddd Giving the error: [ INFO] applying setttings failed: HTTP Error 503: Service Unavailable

This could not be solved even by exporting the cookies file from Firefox, and setting the path in the ScholarConf class. The output gives: [ INFO] loaded cookies file

Does anyone have any idea how to solve this?

juanjobosch avatar Oct 14 '16 16:10 juanjobosch

You have to set Cookies and User-Agent, because Cookies are associated with the User-Agent that was used to generate it

davidbnk avatar Feb 13 '17 05:02 davidbnk

The following method worked for me:

  1. Complete captcha in web browser
  2. Open network tab in Chrome debug menu
  3. Run a new search in google scholar
  4. Right click on the network request created and choose "copy -> copy as cURL"
  5. Paste result and find the portion that is "GSP=..."
  6. Replace the _COOKIES variable in line 25 of scholarly.py with _COOKIES = {'GSP': '<cookie value from above step>'}

jweob avatar Mar 14 '19 04:03 jweob

@jweob can you please explain this a bit? I have tried doing that but again my request is failing. Maybe I'm making a mistake

Anum29 avatar Jan 11 '20 19:01 Anum29