scholar.py
scholar.py copied to clipboard
--citation=FORMAT now produces blank output
"scholar.py -c 1 --txt --author einstein quantum --citation=bt"
gives no output, while
"scholar.py -c 1 --txt --author einstein quantum"
produces the correct output as seen in the example documentation.
This has only been a recent problem, "--citation=bt" used to work for me.
I'm having the same issue recently.
Google has changed something.
Line 969 to 973 of the code: html = self._get_http_response(url=self.SET_SETTINGS_URL % urlargs, log_msg='dump of settings result HTML', err_msg='applying setttings failed') if html is None: return False
This returns False, because url doesn't respond.
You can change to if html is None: print self.SET_SETTINGS_URL % urlargs return False to see the link but I'm not sure how to amend it.
Same issue. When I try to access the URL via browser it responds with a captcha to continue. I'm unsure if this is specific to my case.
Getting same issue. Problem is in the _get_http_response. It is giving the Exception off and so empty results.
Was working fine, then I tried putting some print tags in the send_query function, and after a while it stopped working. Not sure if that is related or not. Downloaded the fresh script and tried again put still getting the same issue of empty output.
The query.get_url() seems to be giving a valid URL though if I check it in a browser
Downloaded for first time, but I'm not getting very useful results.
I believe Google Scholar has changed. The citation URL used to be found in the href attribute of the 'Import' link (line 456). The 'Import' link has been replaced with a 'Cite' link, which has an onclick handler that makes an AJAX call that reveals the citation links.
Maybe the citation URL can be reconstructed based on scrape-able data already on the page?
Here is an example bibtex citation URL:
https://scholar.googleusercontent.com/scholar.bib?q=info:nBVfANMO3WIJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWIpZMk44Iwx8dkfRdkHVZDo5DbQvuGqL&scisf=4&ct=citation&cd=-1&hl=en
EDIT:
The 'Import' link is not showing up because the settings are not getting applied. Running the script with -d -d -d
will show the INFO log, which says [ INFO] applying setttings failed: HTTP Error 503: Service Unavailable
.
As @audiomason pointed out about the 503, what I noticed on trying to navigate to the settings page using the URL that the script generated is that the browser redirected to a captcha page. I think this what causes the response to fail.