scholar.py icon indicating copy to clipboard operation
scholar.py copied to clipboard

Added extraction of url_pdf from right hand side [PDF] link.

Open pmdscully opened this issue 8 years ago • 0 comments

This change will extract the [PDF] href value from the right hand side of a Google Scholar article entry. It will record the URL as url_pdf if the article's url_pdf hasn't already been filled and Google scholar labels the link as a PDF (i.e. the element's text is [PDF]).

Test: scholar.py -c 10 --txt --author "einstein" --phrase "quantum"

Pre-change: 0/4 PDF links extracted Post-change: 4/4 PDF links extracted

As far as I am aware Google Scholar's [PDF] label is the best, easily available indicator of whether the (optional) right hand side anchor refers to a PDF file.

pmdscully avatar Apr 26 '17 18:04 pmdscully