folders2web
folders2web copied to clipboard
Feature request: generalize gscholar.rb
Hello,
your setup for finding & attaching PDFs to Bibdesk entries seems very useful. However it also seems a bit laborious to do all the keyboard short-cuts for each entry. I wonder if you could automate it and remove the applescript dependencies. As per the following pseudocode:
for each publication in mystuff.bib
find relevant PDFs using google scholar
if one PDF is clearly much more relevant than the others (e.g. exact title match)
download the PDF
put it in the appropriate place
add it to the .bib file
if several PDFs seem relevant
don't download or edit the .bib file
instead just output the possibilities in stdout
end
end
I had thought about something similar, for example making it possible to automatically search Google scholar for the title of the currently selected publication, find the PDF, download and attach etc. The problem is that GScholar does not provide any official API, and in fact quite agressively resists being scraped (even when it's just downloading the first page)... I've argued several times that we need to build up an alternative to Google Scholar (as Open Streetmap is to Google Maps), with open APIs, etc - exactly for this kind of purpose.
Stian
On Tue, Jul 23, 2013 at 8:29 AM, Yannick Wurm [email protected]:
Hello,
your setup for finding & attaching PDFs to Bibdesk entries seems very useful. However it also seems a bit laborious to do all the short-cuts for each entry. I wonder if you could automate it and remove the applescript dependencies. As per the following pseudocode:
for each publication in mystuff.bib find relevant PDFs using google scholar if one PDF is clearly much more relevant than the others (e.g. exact title match) download the PDF put it in the appropriate place add it to the .bib file if several PDFs seem relevant don't download or edit the .bib file instead just output the possibilities in stdout end end
— Reply to this email directly or view it on GitHubhttps://github.com/houshuang/folders2web/issues/2 .
http://reganmian.net/blog -- Random Stuff that Matters
Hi Stian,
what do you mean agressively? What if you put a "sleep 10seconds" between two requests?
Cheers, Yannick
Literally stopping me from grabbing even one page without the proper cookies, referer, browserid, etc. Feel free to experiment though, all my code is totally open ;)
On Fri, Jul 26, 2013 at 12:55 PM, Yannick Wurm [email protected]:
Hi Stian,
what do you mean agressively? What if you put a "sleep 10seconds" between two requests?
Cheers, Yannick
— Reply to this email directly or view it on GitHubhttps://github.com/houshuang/folders2web/issues/2#issuecomment-21633610 .
http://reganmian.net/blog -- Random Stuff that Matters
k thanks for the info unfortunatly my time is too limited these days!!
cheers yannick