Feature request: `git_stat_files()`
The gert package has a nice function called git_stat_files(). For each file passed to it, it returns the most recent commit, modification time, and more. Would it be possible to implement a similar function in git2r?
- R function
git_stat_files - C function
R_git_stat_files
Thanks, that is indeed a nice function. I've sketched on a similar function that I can add to git2r
git_stat_files <- function(files, ref = "HEAD", repo = '.') {
do.call("rbind", lapply(as.character(files), function(file) {
created <- NA_character_
modified <- NA_character_
commits <- 0L
authors <- 0L
head <- NA_character_
x <- commits(repo = repo, ref = ref, path = file)
if (length(x)) {
created <- when(x[[length(x)]])
modified <- when(x[[1]])
commits <- length(x)
authors <- length(unique(sapply(x, function(y) y$author$name)))
head <- sha(x[[1]])
}
data.frame(file = file,
created = as.POSIXct(created),
modified = as.POSIXct(modified),
commits = commits,
authors = authors,
head = head)
}))
}
@stewid Thanks for the quick response! I tested the function. One suggestion is to limit the number of commits returned. This reduced it from 20 seconds to 2 seconds when I ran git_stat_files() on the 36 R files in git2r/R/.
x <- commits(repo = repo, ref = ref, path = file, n = 1)
There is still a speed difference though. The gert implementation is twice as fast (1s vs 2s). I only mention this because this is a bottleneck step in my code. Since it gets called a lot, I am investigating how to reduce the computation time.
library(git2r)
r <- clone(
url = "https://github.com/ropensci/git2r.git",
local_path = tempfile()
)
git_stat_files <- function(files, ref = "HEAD", repo = '.') {
do.call("rbind", lapply(as.character(files), function(file) {
created <- NA_character_
modified <- NA_character_
commits <- 0L
authors <- 0L
head <- NA_character_
x <- commits(repo = repo, ref = ref, path = file, n = 1)
if (length(x)) {
created <- when(x[[length(x)]])
modified <- when(x[[1]])
commits <- length(x)
authors <- length(unique(sapply(x, function(y) y$author$name)))
head <- sha(x[[1]])
}
data.frame(file = file,
created = as.POSIXct(created),
modified = as.POSIXct(modified),
commits = commits,
authors = authors,
head = head)
}))
}
files <- Sys.glob(file.path(workdir(r), "R", "*R"))
files_relative <- sub(paste0(workdir(r), "/"), "", files)
system.time(
stat_git2r <- git_stat_files(files_relative, repo = r)
)
## user system elapsed
## 1.60 0.47 2.06
library(gert)
system.time(
stat_gert <- gert::git_stat_files(files_relative, repo = workdir(r))
)
## user system elapsed
## 0.92 0.17 1.09
unlink(workdir(r), recursive = TRUE)
@jdblischak thanks for the feedback. I'm working on a faster version.