rest-api-doc icon indicating copy to clipboard operation
rest-api-doc copied to clipboard

Citesummary

Open paurkedal opened this issue 4 years ago • 3 comments

Is there an efficient way of extracting what corresponds to the Citesummary of the old site? In particular, we have been using queries like http://old.inspirehep.net/search?ln=en&ln=en&p=find+cn+atlas+and+d+2019&of=hcs&action_search=Search&sf=&so=d&rm=&rg=25&sc=0 to extract annual metrics for the ATLAS and ALICE collaborations,

  • "Total number of papers analyzed"
  • "Total number of citations"
  • "Average citations per paper"
  • "hHEP index [?]"

The solution I can see with the documented API is to request the full set of entries and fetch the citations entry of each paper. That may be feasible if we cache time-sliced partial results as we update, though I'm hoping there is a better way.

paurkedal avatar Jun 15 '20 15:06 paurkedal

Currently we don't have a better way, and it will require thousands of request to compute those stats for those large experiments. We will probably expose the citation summary we're using on the website (as appears here) through the API at some point, but I can't tell you when that will happen as it's a bit more tricky than anticipated.

michamos avatar Jun 19 '20 15:06 michamos

Thanks for the info. The website renders the numbers with JavaScript, so it does not look like we can resurrect our solution of parsing HTML. I might still look into computing it, since if we store intermediate result per day, the rate of requests should be limited, but it's not so urgent that it can't wait a few months.

paurkedal avatar Jun 24 '20 07:06 paurkedal

As long as the old site is operational, we can still use our current solution though.

paurkedal avatar Jun 24 '20 07:06 paurkedal