scholar
scholar copied to clipboard
Years with zero citations cause get_article_cite_history() to fail
When using get_article_cite_history(), an article with years with zero citations will cause one of two errors. First, there may be an error message indicating that the length of years is incompatible with vals:
get_article_cite_history("wSXViPYAAAAJ", "KlAtU1dfN6UC") Error in data.frame(year = years, cites = vals) : arguments imply differing number of rows: 17, 16
Second, years that should be zero may be filled in with the incorrect values:
get_article_cite_history("wSXViPYAAAAJ", "9ZlFYXVOiuMC") year cites pubid 1 2005 1 9ZlFYXVOiuMC 2 2006 1 9ZlFYXVOiuMC 3 2007 1 9ZlFYXVOiuMC 4 2008 1 9ZlFYXVOiuMC 5 2009 1 9ZlFYXVOiuMC 6 2010 1 9ZlFYXVOiuMC 7 2011 1 9ZlFYXVOiuMC 8 2012 1 9ZlFYXVOiuMC 9 2013 1 9ZlFYXVOiuMC 10 2014 1 9ZlFYXVOiuMC 11 2015 1 9ZlFYXVOiuMC 12 2016 1 9ZlFYXVOiuMC
The correct citation history for this article contains many zeros:
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=wSXViPYAAAAJ&cstart=20&pagesize=80&citation_for_view=wSXViPYAAAAJ:9ZlFYXVOiuMC
Thanks for looking into this!
Cheers, Joel
I am having the same issue!
get_article_cite_history("QtuhiVMAAAAJ", "IjCSPb-OGe4C")
Error in data.frame(year = years, cites = vals) :
arguments imply differing number of rows: 16, 15
Have confirmed by looking at the google scholar page that it is articles that have years with no citations that is the problem.
Thank you so much!
Also having the same issue ... took a while to figure it out - any year with zero citations causes get_article_cite_history to die.
I'm pretty sure the issue has to do with a dependency and/or conflict upstream. If I modify get_article_cite_history()
such that the only thing I change is to make the rvest
namespace explicit for related functions, everything works as intended.
For example, here is the original get_article_cite_history()
function:
get_article_cite_history <- function(id, article) {
{
site <- getOption("scholar_site")
id <- tidy_id(id)
url_base <- paste0(site, "/citations?", "view_op=view_citation&hl=en&citation_for_view=")
url_tail <- paste(id, article, sep = ":")
url <- paste0(url_base, url_tail)
res <- get_scholar_resp(url)
if (is.null(res))
return(NA)
httr::stop_for_status(res, "get user id / article information")
doc <- read_html(res)
years <- doc %>% html_nodes(".gsc_oci_g_t") %>% html_text() %>%
as.numeric()
vals <- doc %>% html_nodes(".gsc_oci_g_al") %>% html_text() %>%
as.numeric()
df <- data.frame(year = years, cites = vals)
if (nrow(df) > 0) {
df <- merge(data.frame(year = min(years):max(years)),
df, all.x = TRUE)
df[is.na(df)] <- 0
df$pubid <- article
}
else {
df$pubid <- vector(mode = mode(article))
}
return(df)
}
Here is my modified function (called get_article_cite_history_2()
):
get_article_cite_history_2 <- function (id, article) {
site <- getOption("scholar_site")
id <- tidy_id(id)
url_base <- paste0(site, "/citations?",
"view_op=view_citation&hl=en&citation_for_view=")
url_tail <- paste(id, article, sep=":")
url <- paste0(url_base, url_tail)
res <- get_scholar_resp(url)
httr::stop_for_status(res, "get user id / article information")
doc <- rvest::read_html(res)
## Inspect the bar chart to retrieve the citation values and years
years <- doc %>%
rvest::html_nodes(".gsc_oci_g_a") %>%
rvest::html_attr('href') %>%
stringr::str_match("as_ylo=(\\d{4})&") %>%
"["(,2) %>%
as.numeric()
vals <- doc %>%
rvest::html_nodes(".gsc_oci_g_al") %>%
rvest::html_text() %>%
as.numeric()
df <- data.frame(year = years, cites = vals)
if(nrow(df)>0) {
## There may be undefined years in the sequence so fill in these gaps
df <- merge(data.frame(year=min(years):max(years)),
df, all.x=TRUE)
df[is.na(df)] <- 0
df$pubid <- article
} else {
# complete the 0 row data.frame to be consistent with normal results
df$pubid <- vector(mode = mode(article))
}
return(df)
}
The output from running each of these:
> scholar::get_article_cite_history("eD9_J3wAAAAJ", "_FxGoFyzp5QC")
Error in data.frame(year = years, cites = vals) :
arguments imply differing number of rows: 6, 5
> get_article_cite_history_2("eD9_J3wAAAAJ", "_FxGoFyzp5QC")
year cites pubid
1 2016 3 _FxGoFyzp5QC
2 2017 1 _FxGoFyzp5QC
3 2018 0 _FxGoFyzp5QC
4 2019 1 _FxGoFyzp5QC
5 2020 1 _FxGoFyzp5QC
6 2021 5 _FxGoFyzp5QC
A suboptimal workaround right now is to simply replace the get_article_cite_history()
function with the one I made above after calling in library(scholar)
but this seems like something a dev can patch quickly.