go-git icon indicating copy to clipboard operation
go-git copied to clipboard

Filtering Commits Based on Files They Interact With

Open zachgersh opened this issue 6 years ago • 9 comments

Hey All,

I am essentially trying to replicate this git log command in golang and can get most of the way there. Command looks something like this:

git log -- some-magical-path

Git understands when I say this I mean only the commits that interacted with the some-magical-path and not any others. With the library I only seem to be able to iterate over all of the commits and then ask about their files. When interrogating a commit for its files it gives me all of the files referenced by the tree (even when they have not been interacted with). Does anyone have a good idea how this would be accomplished?

Cheers!

zachgersh avatar Aug 24 '17 23:08 zachgersh

Hey @zachgersh,

You are correct that there is currently no explicit support in the library for this, as there is with the standard git tooling.

One way to implement something like this is shown below. No promises that this is a particularly fast algorithm. The basic idea is that we keep track of the version of the path at each commit (by remember the hash of the path contents). We can compare the path hashes between a commit and each of its parents to detect if the path has changed.

I believe the standard git tooling would also include a log entry if either the file permissions for the path changed or the file contents changed. This code example does not support changes to file permissions of the path, so that would be an exercise for the reader. 😛

package main

import (
	"fmt"
	"os"
	"strings"

	"gopkg.in/src-d/go-git.v4"
	. "gopkg.in/src-d/go-git.v4/_examples"
	"gopkg.in/src-d/go-git.v4/plumbing"
	"gopkg.in/src-d/go-git.v4/plumbing/object"
)

// Print log for commits with changes to certain file path
func main() {
	CheckArgs("<repoDir> <path>")
	repoDir := os.Args[1]
	path := os.Args[2]

	// We open the repository at given directory
	r, err := git.PlainOpen(repoDir)
	CheckIfError(err)

	Info(fmt.Sprintf("git log -- %s", path))

	// ... retrieves the branch pointed by HEAD
	ref, err := r.Head()
	CheckIfError(err)

	// ... retrieves the commit history
	cIter, err := r.Log(&git.LogOptions{From: ref.Hash()})
	CheckIfError(err)

	// ... just iterates over the commits, printing it
	err = cIter.ForEach(filterByChangesToPath(r, path, func(c *object.Commit) error {
		fmt.Println(c)
		return nil
	}))
	CheckIfError(err)
}

type memo map[plumbing.Hash]plumbing.Hash

// filterByChangesToPath provides a CommitIter callback that only invokes 
// a delegate callback for commits that include changes to the content of path.
func filterByChangesToPath(r *git.Repository, path string, callback func(*object.Commit) error) func(*object.Commit) error {
	m := make(memo)
	return func(c *object.Commit) error {
		if err := ensure(m, c, path); err != nil {
			return err
		}
		if c.NumParents() == 0 && !m[c.Hash].IsZero() {
			// c is a root commit containing the path
			return callback(c)
		}
		// Compare the path in c with the path in each of its parents
		for _, p := range c.ParentHashes {
			if _, ok := m[p]; !ok {
				pc, err := r.CommitObject(p)
				if err != nil {
					return err
				}
				if err := ensure(m, pc, path); err != nil {
					return err
				}
			}
			if m[p] != m[c.Hash] {
				// contents at path are different from parent
				return callback(c)
			}
		}
		return nil
	}
}

// ensure our memoization includes a mapping from commit hash 
// to the hash of path contents.
func ensure(m memo, c *object.Commit, path string) error {
	if _, ok := m[c.Hash]; !ok {
		t, err := c.Tree()
		if err != nil {
			return err
		}
		te, err := t.FindEntry(path)
		if err == object.ErrDirectoryNotFound {
			m[c.Hash] = plumbing.ZeroHash
			return nil
		} else if err != nil {
			if !strings.ContainsRune(path, '/') {
				// path is in root directory of project, but not found in this commit
				m[c.Hash] = plumbing.ZeroHash
				return nil
			}
			return err
		}
		m[c.Hash] = te.Hash
	}
	return nil
}

orirawlings avatar Aug 25 '17 04:08 orirawlings

@mcuadros has there been any discussion about adding this type of functionality to the library? Do we want to label this issue as "enhancement"?

orirawlings avatar Aug 25 '17 04:08 orirawlings

@orirawlings - thanks for writing this up, I am going to give it a go. It would actually seem that References which was previously made private does pretty much the same thing? I wonder if we could just expose that again.

Somewhat related to #343 - which was looking to pattern match on a partial path. Git log totally supports this behavior though it wasn't clear in the example above :D

zachgersh avatar Aug 25 '17 15:08 zachgersh

This code works perfectly btw. Really appreciate this @orirawlings - hugely helped me with a project I am working on.

zachgersh avatar Aug 25 '17 16:08 zachgersh

I'll add the enhancement label on this one. I think it might be feasible to extend some of the example code and work it into the library API.

orirawlings avatar Aug 25 '17 17:08 orirawlings

Since repo.Log returns a object.CommitIter interface, I think it would be nice to have a function that takes a CommitIter and files, filter commits and return a new CommitIter interface.

Something like

func FilterCommitsByFilePath(iter object.CommitIter, files map[string]bool, exclude bool) (object.CommitIter, error) {

This can be also used to exclude some files, by giving exclude = false as well as false values to those files in the files map.

ilius avatar Mar 19 '18 11:03 ilius

Is this a duplicate of #826 which has been closed in https://github.com/src-d/go-git/pull/979 ?

marians avatar Nov 29 '19 08:11 marians

I don't think it is. Since some-magical-path can be the path of a directory (parent of many files) I wish the Option.FileName field was called Path to allow that.

Or even maybe PathFilter func(string) bool to allow pattern / regexp / globe on file/directory path.

ilius avatar Nov 29 '19 09:11 ilius

Should be safe to close since this PR is merged and PathFilter func(string) bool is added

ilius avatar Mar 09 '20 23:03 ilius