scholar.py
scholar.py copied to clipboard
Added extraction of url_pdf from right hand side [PDF] link.
This change will extract the [PDF] href value from the right hand side of a Google Scholar article entry. It will record the URL as url_pdf if the article's url_pdf hasn't already been filled and Google scholar labels the link as a PDF (i.e. the element's text is [PDF]).
Test: scholar.py -c 10 --txt --author "einstein" --phrase "quantum"
Pre-change: 0/4 PDF links extracted Post-change: 4/4 PDF links extracted
As far as I am aware Google Scholar's [PDF] label is the best, easily available indicator of whether the (optional) right hand side anchor refers to a PDF file.