scholar.py
scholar.py copied to clipboard
List papers citing a paper
Xavi Anguera has suggested making the list of papers citing a paper queryable via the API. This needs a bit more thinking about the notion of paper identity (cluster ID) vs presentation to the user, but shouldn't be a big problem otherwise.
I would love to see this, too. I need to analyze the set of citations to a particular article, and it is very painful to do this manually, screenful by screenful.
Me too, this would make scholar.py an incredibly powerful tool!
On 07 May 2014, at 17:00, rpgoldman [email protected] wrote:
I would love to see this, too. I need to analyze the set of citations to a particular article, and it is very painful to do this manually, screenful by screenful.
— Reply to this email directly or view it on GitHub.
:+1: Thing is how do we counter the API query limits? I sometimes wish Google provided a free access to it's repository, just like arXiv. Is setting up tor a good idea?
I'm willing to live with reasonable throttling. E.g., 250 articles (I just pulled by hand) is a horrible nuisance by me, but probably in the noise for Google, especially if I do it once in a blue moon.
Archit Sharma wrote:
:+1: Thing is how do we counter the API query limits? I sometimes wish Google provided a free access to it's repository, just like arXiv. Is setting up tor a good idea?
— Reply to this email directly or view it on GitHub https://github.com/ckreibich/scholar.py/issues/5#issuecomment-42450888.
Duly noted, folks! Support for this is on the way.
@arcolife, Tor will help you little regarding query limits; in fact, given that it's easy to identify Tor exits it might actually make things worse for you. The only real help will be distributed clients, but you'll have to build that botnet yourself. :)
@ckreibich I see! :shit:
Btw I've been building a recommendation engine for Research papers. It would be nice to have this feature, as it would add on to the currently available sources. I'm willing to contribute! :)
I can take on this issue. Need a little guidance and help with the existing though though. If I am understanding the problem:
You can get the link to the citing papers by accessing the url_citations
attribute.
Eg:
querier.articles[0]attrs.get('url_citations')[0]
should return something like u'http://scholar.google.com/scholar?cites=5556531000720111691&as_sdt=2005&sciodt=0,5&hl=en'
And since we have a new search result, the goal is to parse this page into individual articles?
If it helps: this is how I stumbled on this issue: I was writing an article, and wanted to claim that the literature on topic X did not contain any article that addressed issue I.
Google Scholar had the right information to do this, but it was very painful to extract that information. I had to scroll through pages and pages of articles, moving from page to page interactively. And there was no way to check this claim for correctness over time. I.e., if I reran the query, I had no obvious way to check to see if the results were the same, or if new papers had appeared.
I was hoping to be able to automate this process at least somewhat.
I can put up an ipython notebook with a working example of my extension module that implements this. @rpgoldman I'm pretty much in the same boat as you. Once I finish up implementing my extension I can see what would be the best way to fold the code in.
#10 seems to address the problem, but it's not merged, and I'm not 100% sure if it's doing what I want at the moment.
Hi guys, are there any news on this issue? Im about to implement the same, @chendaniely did you implement something on this?
hey @marianormuro sorry for the really late reply. My implementation is really hacky, janky, and untested. Probably shouldn't really use it for 'serious' work
#83 seems to have solved the discussed issue.