git2r icon indicating copy to clipboard operation
git2r copied to clipboard

Feature request: `git_stat_files()`

Open jdblischak opened this issue 4 years ago • 3 comments

The gert package has a nice function called git_stat_files(). For each file passed to it, it returns the most recent commit, modification time, and more. Would it be possible to implement a similar function in git2r?

jdblischak avatar Jul 16 '21 00:07 jdblischak

Thanks, that is indeed a nice function. I've sketched on a similar function that I can add to git2r

git_stat_files <- function(files, ref = "HEAD", repo = '.') {
    do.call("rbind", lapply(as.character(files), function(file) {
        created <- NA_character_
        modified <- NA_character_
        commits <- 0L
        authors <- 0L
        head <- NA_character_

        x <- commits(repo = repo, ref = ref, path = file)
        if (length(x)) {
            created <- when(x[[length(x)]])
            modified <- when(x[[1]])
            commits <- length(x)
            authors <- length(unique(sapply(x, function(y) y$author$name)))
            head <- sha(x[[1]])
        }

        data.frame(file = file,
                   created = as.POSIXct(created),
                   modified = as.POSIXct(modified),
                   commits = commits,
                   authors = authors,
                   head = head)
    }))
}

stewid avatar Jul 17 '21 14:07 stewid

@stewid Thanks for the quick response! I tested the function. One suggestion is to limit the number of commits returned. This reduced it from 20 seconds to 2 seconds when I ran git_stat_files() on the 36 R files in git2r/R/.

x <- commits(repo = repo, ref = ref, path = file, n = 1)

There is still a speed difference though. The gert implementation is twice as fast (1s vs 2s). I only mention this because this is a bottleneck step in my code. Since it gets called a lot, I am investigating how to reduce the computation time.

library(git2r)

r <- clone(
  url = "https://github.com/ropensci/git2r.git",
  local_path = tempfile()
)

git_stat_files <- function(files, ref = "HEAD", repo = '.') {
  do.call("rbind", lapply(as.character(files), function(file) {
    created <- NA_character_
    modified <- NA_character_
    commits <- 0L
    authors <- 0L
    head <- NA_character_
    
    x <- commits(repo = repo, ref = ref, path = file, n = 1)
    if (length(x)) {
      created <- when(x[[length(x)]])
      modified <- when(x[[1]])
      commits <- length(x)
      authors <- length(unique(sapply(x, function(y) y$author$name)))
      head <- sha(x[[1]])
    }
    
    data.frame(file = file,
               created = as.POSIXct(created),
               modified = as.POSIXct(modified),
               commits = commits,
               authors = authors,
               head = head)
  }))
}

files <- Sys.glob(file.path(workdir(r), "R", "*R"))
files_relative <- sub(paste0(workdir(r), "/"), "", files)

system.time(
  stat_git2r <- git_stat_files(files_relative, repo = r)
)
##   user  system elapsed 
##   1.60    0.47    2.06 

library(gert)

system.time(
  stat_gert <- gert::git_stat_files(files_relative, repo = workdir(r))
)
##   user  system elapsed 
##   0.92    0.17    1.09 

unlink(workdir(r), recursive = TRUE)

jdblischak avatar Jul 19 '21 15:07 jdblischak

@jdblischak thanks for the feedback. I'm working on a faster version.

stewid avatar Jul 20 '21 08:07 stewid