go-git
go-git copied to clipboard
Filtering Commits Based on Files They Interact With
Hey All,
I am essentially trying to replicate this git log
command in golang and can get most of the way there. Command looks something like this:
git log -- some-magical-path
Git understands when I say this I mean only the commits that interacted with the some-magical-path
and not any others. With the library I only seem to be able to iterate over all of the commits and then ask about their files. When interrogating a commit for its files it gives me all of the files referenced by the tree (even when they have not been interacted with). Does anyone have a good idea how this would be accomplished?
Cheers!
Hey @zachgersh,
You are correct that there is currently no explicit support in the library for this, as there is with the standard git tooling.
One way to implement something like this is shown below. No promises that this is a particularly fast algorithm. The basic idea is that we keep track of the version of the path at each commit (by remember the hash of the path contents). We can compare the path hashes between a commit and each of its parents to detect if the path has changed.
I believe the standard git tooling would also include a log entry if either the file permissions for the path changed or the file contents changed. This code example does not support changes to file permissions of the path, so that would be an exercise for the reader. 😛
package main
import (
"fmt"
"os"
"strings"
"gopkg.in/src-d/go-git.v4"
. "gopkg.in/src-d/go-git.v4/_examples"
"gopkg.in/src-d/go-git.v4/plumbing"
"gopkg.in/src-d/go-git.v4/plumbing/object"
)
// Print log for commits with changes to certain file path
func main() {
CheckArgs("<repoDir> <path>")
repoDir := os.Args[1]
path := os.Args[2]
// We open the repository at given directory
r, err := git.PlainOpen(repoDir)
CheckIfError(err)
Info(fmt.Sprintf("git log -- %s", path))
// ... retrieves the branch pointed by HEAD
ref, err := r.Head()
CheckIfError(err)
// ... retrieves the commit history
cIter, err := r.Log(&git.LogOptions{From: ref.Hash()})
CheckIfError(err)
// ... just iterates over the commits, printing it
err = cIter.ForEach(filterByChangesToPath(r, path, func(c *object.Commit) error {
fmt.Println(c)
return nil
}))
CheckIfError(err)
}
type memo map[plumbing.Hash]plumbing.Hash
// filterByChangesToPath provides a CommitIter callback that only invokes
// a delegate callback for commits that include changes to the content of path.
func filterByChangesToPath(r *git.Repository, path string, callback func(*object.Commit) error) func(*object.Commit) error {
m := make(memo)
return func(c *object.Commit) error {
if err := ensure(m, c, path); err != nil {
return err
}
if c.NumParents() == 0 && !m[c.Hash].IsZero() {
// c is a root commit containing the path
return callback(c)
}
// Compare the path in c with the path in each of its parents
for _, p := range c.ParentHashes {
if _, ok := m[p]; !ok {
pc, err := r.CommitObject(p)
if err != nil {
return err
}
if err := ensure(m, pc, path); err != nil {
return err
}
}
if m[p] != m[c.Hash] {
// contents at path are different from parent
return callback(c)
}
}
return nil
}
}
// ensure our memoization includes a mapping from commit hash
// to the hash of path contents.
func ensure(m memo, c *object.Commit, path string) error {
if _, ok := m[c.Hash]; !ok {
t, err := c.Tree()
if err != nil {
return err
}
te, err := t.FindEntry(path)
if err == object.ErrDirectoryNotFound {
m[c.Hash] = plumbing.ZeroHash
return nil
} else if err != nil {
if !strings.ContainsRune(path, '/') {
// path is in root directory of project, but not found in this commit
m[c.Hash] = plumbing.ZeroHash
return nil
}
return err
}
m[c.Hash] = te.Hash
}
return nil
}
@mcuadros has there been any discussion about adding this type of functionality to the library? Do we want to label this issue as "enhancement"?
@orirawlings - thanks for writing this up, I am going to give it a go. It would actually seem that References
which was previously made private does pretty much the same thing? I wonder if we could just expose that again.
Somewhat related to #343 - which was looking to pattern match on a partial path. Git log totally supports this behavior though it wasn't clear in the example above :D
This code works perfectly btw. Really appreciate this @orirawlings - hugely helped me with a project I am working on.
I'll add the enhancement label on this one. I think it might be feasible to extend some of the example code and work it into the library API.
Since repo.Log
returns a object.CommitIter
interface, I think it would be nice to have a function that takes a CommitIter
and files, filter commits and return a new CommitIter
interface.
Something like
func FilterCommitsByFilePath(iter object.CommitIter, files map[string]bool, exclude bool) (object.CommitIter, error) {
This can be also used to exclude some files, by giving exclude = false
as well as false
values to those files in the files
map.
Is this a duplicate of #826 which has been closed in https://github.com/src-d/go-git/pull/979 ?
I don't think it is.
Since some-magical-path
can be the path of a directory (parent of many files)
I wish the Option.FileName
field was called Path
to allow that.
Or even maybe PathFilter func(string) bool
to allow pattern / regexp / globe on file/directory path.
Should be safe to close since this PR is merged and PathFilter func(string) bool
is added