scholar.py icon indicating copy to clipboard operation
scholar.py copied to clipboard

Running the bibtex example with python3 from the README.md yields no result

Open jessebrennan opened this issue 7 years ago • 11 comments

Adding debug flags and we see:

$ python3 scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" --citation bt -ddd
[ INFO]  using log level 3
[ INFO]  requesting http://scholar.google.com/scholar_settings?sciifh=1&hl=en&as_sdt=0,5
[ INFO]  parsing settings failed: no form
[ INFO]  requesting http://scholar.google.com/scholar?as_q=&as_epq=quantum theory&as_oq=&as_eq=&as_occt=any&start=&as_sauthors=albert einstein&as_publication=&as_ylo=&as_yhi=&as_vis=0&btnG=&hl=en&num=1&as_sdt=0,5

$

The first example seems to work fine. AFAIK this error seems to be due to a website layout change from Google Scholar.

jessebrennan avatar Oct 03 '17 18:10 jessebrennan

I can confirm this error fails for all the citation types and breaks the ability to use the command to export citations. It must be something in the parser, but I don't have the skills to know where to start to puzzle it out. If you give me some hints I could do some further investigating.

brittAnderson avatar Oct 05 '17 17:10 brittAnderson

The settings page on google scholar has changed. Line 985 needs to be changed to:
tag = soup.find(name='form', attrs={'id': 'gs_bdy_frm'}) After inserting this, scholar settings are successfully saved. However, it then returns:

Traceback (most recent call last):
  File "/home/dambam/bin/papers/scholar.py", line 1311, in <module>
    sys.exit(main())
  File "/home/dambam/bin/papers/scholar.py", line 1301, in main
    citation_export(querier)
  File "/home/dambam/bin/papers/scholar.py", line 1146, in citation_export
    print(art.as_citation() + '\n')

This can be fixed by changing 1145 to:
print(art.as_citation() + "\n".encode('ascii'))

scholar.py --citation=bt -a "einstein" will then return: b'@article{einstein1935can,\n title={Can quantum-mechanical description of physical reality be considered complete?},\n author={Einstein, Albert and Podolsky, Boris and Rosen, Nathan},\n journal={Physical review},\n volume={47},\n number={10},\n pages={777},\n year={1935},\n publisher={APS}\n}\n\n'

portalgun avatar Oct 07 '17 19:10 portalgun

Also, if you have done too many searches to quickly, google scholar will ask for captcha. Scholar.py won't explicitly indicate anything is wrong in this situation, other than printing nothing. If you are getting at least blank lines, then you probably have the problem associated with my last comment.

portalgun avatar Oct 08 '17 02:10 portalgun

@portalgun Do you want to make a PR for these changes? Even though no one merges them it could be helpful for other people who run into the problem to just download the patch. If not, I can gladly make the PR.

jessebrennan avatar Oct 09 '17 17:10 jessebrennan

Thanks to those of you who have suggested solutions. I've tried the two edits you suggested but am facing all of these traceback errors.

`--------------------------------------------------------------------------- BadOptionError Traceback (most recent call last) //anaconda/lib/python3.6/optparse.py in parse_args(self, args, values) 1386 try: -> 1387 stop = self._process_args(largs, rargs, values) 1388 except (BadOptionError, OptionValueError) as err:

//anaconda/lib/python3.6/optparse.py in _process_args(self, largs, rargs, values) 1430 # value(s) for the last one only) -> 1431 self._process_short_opts(rargs, values) 1432 elif self.allow_interspersed_args:

//anaconda/lib/python3.6/optparse.py in _process_short_opts(self, rargs, values) 1512 if not option: -> 1513 raise BadOptionError(opt) 1514 if option.takes_value():

BadOptionError: no such option: -f

During handling of the above exception, another exception occurred:

SystemExit Traceback (most recent call last) in () 1269 1270 if name == "main": -> 1271 sys.exit(main())

in main() 1180 parser.add_option_group(group) 1181 -> 1182 options, _ = parser.parse_args() 1183 1184 # Show help if we have neither keyword search nor author name

//anaconda/lib/python3.6/optparse.py in parse_args(self, args, values) 1387 stop = self._process_args(largs, rargs, values) 1388 except (BadOptionError, OptionValueError) as err: -> 1389 self.error(str(err)) 1390 1391 args = largs + rargs

//anaconda/lib/python3.6/optparse.py in error(self, msg) 1567 """ 1568 self.print_usage(sys.stderr) -> 1569 self.exit(2, "%s: error: %s\n" % (self.get_prog_name(), msg)) 1570 1571 def get_usage(self):

//anaconda/lib/python3.6/optparse.py in exit(self, status, msg) 1557 if msg: 1558 sys.stderr.write(msg) -> 1559 sys.exit(status) 1560 1561 def error(self, msg):

SystemExit: 2`

orangewords avatar Oct 11 '17 01:10 orangewords

optparser is having problem, reinstall it.

yskale avatar Oct 11 '17 03:10 yskale

Thanks. That solved the problem, but now I am back to getting a syntax error when I try to run a test with the sample query. scholar.py --citation=bt -a "einstein" returns File "<ipython-input-7-46a87b6d443b>", line 1 scholar.py --citation=bt -a "einstein" ^ SyntaxError: invalid syntax

orangewords avatar Oct 11 '17 03:10 orangewords

Think I've got it working! Thanks again

orangewords avatar Oct 11 '17 04:10 orangewords

This is again not working. I've applied the suggested changes, and the URL is malformed, e.g:

[ INFO] requesting citation data failed: HTTP Error 404: Not Found [ INFO] retrieving citation export data [ INFO] requesting http://scholar.google.com/https://scholar.googleusercontent.com/scholar.bib?q=info:kpSD9apcVf8J:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWhmheR7o8h0e2pro0VQ3wwSoU5_DiIIu&scisf=4&ct=citation&cd=8&hl=en

Clearly the "http://scholar.google.com" is too much. Cutting+pasting the https://scholar.googleusercontent.com... works though. So not far.

hugues-talbot avatar Nov 25 '17 16:11 hugues-talbot

@hugues-talbot to fix this issue change https://github.com/ckreibich/scholar.py/blob/master/scholar.py#L515 to

if path.startswith('http://') or path.startswith('https://'):

jessebrennan avatar Feb 15 '18 20:02 jessebrennan

To maintain the functioning of the newline (\n) operator, I had to change line 1145 to

art.as_citation().decode("utf-8") + "\n" and NOT to print(art.as_citation() + "\n".encode('ascii'))

My system runs under Windows 10.

SvennoNito avatar Jan 19 '20 11:01 SvennoNito