bigcows icon indicating copy to clipboard operation
bigcows copied to clipboard

"year" column is not accurate

Open lintool opened this issue 6 years ago • 1 comments

Noted by @dragomirradev

The "year" column is based on the earliest year in the citation count histogram, which in fact is not the earliest year in terms of publications.

For example: Screen Shot 2019-08-24 at 10 26 34 AM

But see:

Screen Shot 2019-08-24 at 10 26 57 AM

One reasonable hypothesis is that the histogram is capped at 20 years... but here's a counterexample:

Screen Shot 2019-08-24 at 10 28 54 AM

No idea what's going on.

From a crawling perspective, the histogram is easy to get. Getting actual earliest requires sort pubs by time and then "scrolling".

lintool avatar Aug 24 '19 02:08 lintool

One explanation for this discrepancy would be the histogram captures when citations occur and not the citations to the papers published in the year. For example, if a paper is published in the year 2010 and receives a citation in the year 2016, in the histogram, this citation is added to the year 2016.

As for the crawling issue, I have resolved it in a python scraper. I will link to it in a subsequent comment.

mahtab-nejati avatar Sep 21 '22 11:09 mahtab-nejati