scholar.py icon indicating copy to clipboard operation
scholar.py copied to clipboard

--citation=FORMAT now produces blank output

Open trevor-vincent opened this issue 7 years ago • 6 comments

"scholar.py -c 1 --txt --author einstein quantum --citation=bt"

gives no output, while

"scholar.py -c 1 --txt --author einstein quantum"

produces the correct output as seen in the example documentation.

This has only been a recent problem, "--citation=bt" used to work for me.

trevor-vincent avatar Oct 17 '16 01:10 trevor-vincent

I'm having the same issue recently.

Google has changed something.

Line 969 to 973 of the code: html = self._get_http_response(url=self.SET_SETTINGS_URL % urlargs, log_msg='dump of settings result HTML', err_msg='applying setttings failed') if html is None: return False

This returns False, because url doesn't respond.

You can change to if html is None: print self.SET_SETTINGS_URL % urlargs return False to see the link but I'm not sure how to amend it.

muhsincan avatar Nov 02 '16 22:11 muhsincan

Same issue. When I try to access the URL via browser it responds with a captcha to continue. I'm unsure if this is specific to my case.

joelthe1 avatar Nov 09 '16 22:11 joelthe1

Getting same issue. Problem is in the _get_http_response. It is giving the Exception off and so empty results.

Was working fine, then I tried putting some print tags in the send_query function, and after a while it stopped working. Not sure if that is related or not. Downloaded the fresh script and tried again put still getting the same issue of empty output.

The query.get_url() seems to be giving a valid URL though if I check it in a browser

smaameri avatar Nov 24 '16 13:11 smaameri

Downloaded for first time, but I'm not getting very useful results.

reagle avatar Jan 25 '17 19:01 reagle

I believe Google Scholar has changed. The citation URL used to be found in the href attribute of the 'Import' link (line 456). The 'Import' link has been replaced with a 'Cite' link, which has an onclick handler that makes an AJAX call that reveals the citation links.

Maybe the citation URL can be reconstructed based on scrape-able data already on the page?

Here is an example bibtex citation URL: https://scholar.googleusercontent.com/scholar.bib?q=info:nBVfANMO3WIJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWIpZMk44Iwx8dkfRdkHVZDo5DbQvuGqL&scisf=4&ct=citation&cd=-1&hl=en

EDIT: The 'Import' link is not showing up because the settings are not getting applied. Running the script with -d -d -d will show the INFO log, which says [ INFO] applying setttings failed: HTTP Error 503: Service Unavailable.

audiomason avatar Jan 26 '17 20:01 audiomason

As @audiomason pointed out about the 503, what I noticed on trying to navigate to the settings page using the URL that the script generated is that the browser redirected to a captcha page. I think this what causes the response to fail.

joelthe1 avatar Jan 28 '17 02:01 joelthe1