GitPython icon indicating copy to clipboard operation
GitPython copied to clipboard

`IndexFile.diff(None)` returns empty after `init -> add -> write -> read` sequence on a new repository

Open ElJaviLuki opened this issue 7 months ago • 1 comments

Environment:

  • GitPython version: 3.1.44
  • Git version: git version 2.42.0.windows.2
  • Python version: 3.12.0
  • Operating System: Windows 11 Pro 24H2 26100.3775

Description: When initializing a new repository, adding a file to the index, writing the index to disk, and then explicitly reading the index back, a subsequent call to repo.index.diff(None) incorrectly returns an empty DiffIndex (an empty list). This occurs even though an external git status --porcelain command correctly shows the file as added to the index (stage 'A').

This suggests that the in-memory state of the IndexFile object is not correctly reflecting the on-disk state for the diff(None) operation under these specific circumstances, even after an explicit repo.index.read().

Steps to Reproduce:

import os
import tempfile
import shutil
from git import Repo, IndexFile, Actor

# Setup a temporary directory for the new repository
repo_dir = tempfile.mkdtemp(prefix="test_gitpython_index_issue_")
try:
    # 1. Initialize a new repository
    repo = Repo.init(repo_dir)
    print(f"Repository initialized at: {repo_dir}")
    print(f"Is bare: {repo.bare}") # Should be False

    # 2. Create and add a new file (.gitkeep in this example)
    gitkeep_path = os.path.join(repo.working_tree_dir, ".gitkeep")
    with open(gitkeep_path, 'w') as f:
        f.write("# Initial file\n")
    print(f".gitkeep created at: {gitkeep_path}")

    index = repo.index
    index.add([".gitkeep"]) # Relative path to repo root
    print(f"Added '.gitkeep' to index object in memory.")

    # 3. Write the index to disk
    index.write()
    print(f"Index written to disk at: {index.path}")
    assert os.path.exists(index.path), "Index file should exist on disk"

    # 4. (Optional but good for verification) Check with external git status
    status_output = repo.git.status(porcelain=True)
    print(f"git status --porcelain output: '{status_output}'")
    assert "A  .gitkeep" in status_output or "?? .gitkeep" in status_output # Should be 'A ' after add+write

    # 5. Explicitly re-read the index (or create a new IndexFile instance)
    #    This step is crucial to the bug demonstration.
    index.read() # Force re-read of the IndexFile instance
    # Alternatively: index = IndexFile(repo) # Create new instance, should also read from disk
    print(f"Index explicitly re-read. Number of entries: {len(index.entries)}")
    assert len(index.entries) > 0, "Index should have entries after add/write/read"
    
    # 6. Perform a diff of the index against an empty tree (None)
    # This simulates what happens before an initial commit to see staged changes.
    diff_against_empty_tree = index.diff(None) 
    print(f"index.diff(None) result: {diff_against_empty_tree}")
    print(f"Type of result: {type(diff_against_empty_tree)}")
    for item_diff in diff_against_empty_tree:
        print(f"  Diff item: a_path={item_diff.a_path}, b_path={item_diff.b_path}, change_type={item_diff.change_type}, new_file={item_diff.new_file}")


    # Expected behavior:
    # index.diff(None) should return a DiffIndex containing one Diff object
    # representing the newly added '.gitkeep' file (change_type 'A').
    assert len(diff_against_empty_tree) == 1, \
        f"Expected 1 diff item, got {len(diff_against_empty_tree)}. Entries: {index.entries}"
    diff_item = diff_against_empty_tree[0]
    assert diff_item.change_type == 'A', \
        f"Expected change_type 'A', got '{diff_item.change_type}'"
    assert diff_item.b_path == ".gitkeep", \
        f"Expected b_path '.gitkeep', got '{diff_item.b_path}'"

except Exception as e:
    print(f"An error occurred: {e}")
    raise
finally:
    # Clean up the temporary directory
    # shutil.rmtree(repo_dir)
    # print(f"Cleaned up temp directory: {repo_dir}")
    pass

# To run this reproducer:
# 1. Save as a .py file.
# 2. Ensure GitPython is installed.
# 3. Run `python your_file_name.py`

Actual Behavior: repo.index.diff(None) returns an empty DiffIndex (i.e., []).

Expected Behavior: repo.index.diff(None) should return a DiffIndex containing one Diff object for .gitkeep with change_type='A', new_file=True, a_path=None, and b_path='.gitkeep'.

Additional Context:

  • This issue prevents correctly determining staged changes for an initial commit using index.diff(None).
  • The index.entries dictionary does seem to reflect the added file correctly after index.read().
  • The repo.git.status(porcelain=True) command correctly shows the file as staged for addition (A .gitkeep).
  • The problem seems specific to how IndexFile.diff(None) interprets the IndexFile's state after this sequence of operations in a new repository before the first commit. Diffing against HEAD (once a commit exists) or other trees might behave differently.

ElJaviLuki avatar May 11 '25 21:05 ElJaviLuki

Thanks a lot for reporting, as well as the exhaustive description with a reproducer.

It appears that index.diff(None) doesn't make a call to Git or else I'd expect it to pick up the change. However, I also don't recall to ever have implemented diffing itself in GitPython, and even if it was that, the test proves that the index is up-to-date in memory.

A possible fix could include a possibly modified version of the reproduction above in the first commit and a fix in the next one.

Byron avatar May 12 '25 04:05 Byron