cranlogs icon indicating copy to clipboard operation
cranlogs copied to clipboard

limit on number of packages as argument to cran_downloads

Open adfi opened this issue 5 years ago • 7 comments

Hi,

I tried to do get download counts for 8000 packages and ran into a HTTP 414 (Request-URI Too Long). After some trial and error it seems the limit is at 905 packages, reproducable with following code:

cran_downloads(package = rep('cranlogs', 906))

I can split up the requests but it would be nicer to have that done by the package. Also the limit is not documented. Let me know if I'm doing something the package wasn't intended for.

adfi avatar Jul 16 '20 11:07 adfi

Well, that's the URL length limit I guess, because the package names are sent in the URL. We could have a POST API, and then there is no limit.

gaborcsardi avatar Jul 16 '20 14:07 gaborcsardi

So where does the change need to happen? In cranlogs.app?

adfi avatar Jul 17 '20 21:07 adfi

Everywhere. Frankly, it is simpler to return all packages, if you want 8000, then you might as well get all of them. :)

gaborcsardi avatar Jul 17 '20 21:07 gaborcsardi

@gaborcsardi This could be done within cranlogs by submitting the list of packages in batches, right?

bschilder avatar Dec 19 '22 18:12 bschilder

Would need to know what the max batch size can be (ie at what point does the URI get too long, on average):

batch_size =1000
v <- rownames(utils::available.packages())
batches <- split(v, ceiling(seq_along(v)/batch_size))
     cran <- lapply(seq_len(length(batches)),
                           function(i){
                               b <- batches[[i]]
                               message(paste("Batch:",i,"/",length(batches)))
                               dt <-  cran <- cranlogs::cran_downloads(
                                   packages = b, 
                                   from = "1990-01-01", 
                                   to = Sys.Date()-1)   
                               return(dt)
                           }) |> 
            data.table::rbindlist(fill=TRUE) 

Should be an easy fix. Happy to make a PR.

bschilder avatar Dec 19 '22 18:12 bschilder

@adfi , I agree this should be handled internally by the package or at least documented to note the limitation.

bschilder avatar Dec 19 '22 18:12 bschilder

Done here @adfi : https://github.com/r-hub/cranlogs/pull/67

bschilder avatar Dec 19 '22 20:12 bschilder