pygit2 icon indicating copy to clipboard operation
pygit2 copied to clipboard

Broken GIT_SORT_TIME ?

Open pypingou opened this issue 7 years ago • 17 comments

This is a script to reproduce the behavior I'm seeing:

#!/usr/bin/env python

import tempfile
import time
import os

import pygit2

path = tempfile.mkdtemp(prefix='pygit2_test_')
repo = pygit2.init_repository(path)

author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')

# Commit #1

 # Create a file in that git repo
with open(os.path.join(path, 'sources'), 'w') as stream:
    stream.write('foo\n bar')
repo.index.add('sources')
repo.index.write()

parents = []
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
    'refs/heads/master',  # the name of the reference to update
    author,
    committer,
    'Commit #1',
    # binary string representing the tree object ID
    tree,
    # list of binary strings representing parents of the new commit
    parents,
))
commit1 = repo.revparse_single('HEAD')
time.sleep(0.1)

# Commit #2

parents = [commit1.oid.hex]

subfolder = os.path.join('folder1', 'folder2')
os.makedirs(os.path.join(path, subfolder))
# Create a file in that git repo
with open(os.path.join(path, subfolder, 'file'), 'w') as stream:
    stream.write('foo\n bar\nbaz')
repo.index.add(os.path.join(subfolder, 'file'))
repo.index.write()

# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
    'refs/heads/master',  # the name of the reference to update
    author,
    committer,
    'Commit #2',
    # binary string representing the tree object ID
    tree,
    # list of binary strings representing parents of the new commit
    parents
))
commit2 = repo.revparse_single('HEAD')
time.sleep(0.1)


# Commit #3

parents = [commit2.oid.hex]

# Update the sources file
with open(os.path.join(path, 'sources'), 'w') as stream:
    stream.write('foo\n bar\nbaz!')
repo.index.add('sources')
repo.index.write()

# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
    'refs/heads/master',  # the name of the reference to update
    author,
    committer,
    'Commit #3',
    # binary string representing the tree object ID
    tree,
    # list of binary strings representing parents of the new commit
    parents
))
commit3 = repo.revparse_single('HEAD')


print('')
main_walker = repo.walk(commit3.oid.hex, pygit2.GIT_SORT_TIME)
commits_msgs = []
while 1:
    try:
        com = main_walker.next()
        print(com.message)
        print(com.oid.hex)
    except StopIteration:
        break

It gives me the following output:

7335ffcfdb2039a9631ef5c891ec6a12feaeb342
50ea7d4691ee78b60323ac1d4174a6609190fbaf
47e61d2eaa8c5d3d1562475a43aa99c0e7fb9b8a

Commit #3
47e61d2eaa8c5d3d1562475a43aa99c0e7fb9b8a
Commit #1
7335ffcfdb2039a9631ef5c891ec6a12feaeb342
Commit #2
50ea7d4691ee78b60323ac1d4174a6609190fbaf

The first three lines are the commit as they are committed. The following lines are the commit message and hash as repo.walk(commit3, pygit2.GIT_SORT_TIME) is returning them.

Something looks broken to me :)

pypingou avatar Nov 30 '17 08:11 pypingou

Oh, forgot to add, I'm running: python2-pygit2-0.26.0-3.fc27.x86_64 libgit2-0.26.0-3.fc27.x86_64

And the issue doesn't occur on: python-pygit2-0.24.2-1.el7.x86_64 libgit2-0.24.6-2.puiterwijkpatch2.el7.x86_64

Thanks

pypingou avatar Nov 30 '17 08:11 pypingou

Gentle ping? I'm really curious to know if others are thing this or if I'm doing something wrong :)

pypingou avatar Dec 15 '17 09:12 pypingou

I think that time granularity is 1 second, so if increasing time.sleep to 1s it should work, but it doesn't. Looking at the repo all commits have the same time.

So I feel like GIT_SORT_TIME is right, but the commits are created with the wrong time.

jdavid avatar Dec 25 '17 11:12 jdavid

Indeed, I even increased the time.sleep to 2 seconds but if I print print(com.oid.hex, com.commit_time) line 99, I get:

6ecf8698c6c4cc294b5a9be4b550d10e53afdbc7
2b708aaaf85ef0f7913db61f9c14489f79844b7b
42da3d707d2d69799705174b6fed4bd6bed66aa1

Commit #3
42da3d707d2d69799705174b6fed4bd6bed66aa1 1514211183
Commit #1
6ecf8698c6c4cc294b5a9be4b550d10e53afdbc7 1514211183
Commit #2
2b708aaaf85ef0f7913db61f9c14489f79844b7b 1514211183

pypingou avatar Dec 25 '17 14:12 pypingou

The time is part of the signature, it's set when the author and committer are defined. See https://github.com/libgit2/pygit2/blob/master/src/signature.c#L58

jdavid avatar Dec 25 '17 18:12 jdavid

Git blame shows this line changed 5 years ago, but that behavior did not exist in: python-pygit2-0.24.2-1.el7.x86_64 libgit2-0.24.6-2.el7.x86_64 which is from November 2016, odd :s

pypingou avatar Dec 26 '17 07:12 pypingou

Ok, so thanks to your input and hint about the commit date/time I was able to find the issue in my code (well tests in this case), turns out my original code was at fault, printing the commit date/time showed it and adding some time.sleep(1) fixed it.

Thanks for your help!

pypingou avatar Jan 03 '18 09:01 pypingou

Do you mean version 0.24 has the same behaviour? Before you stated it did not.

jdavid avatar Jan 03 '18 10:01 jdavid

It did not have the same behavior, adding the time.sleep(1) should make that behavior consistent, still need to confirm it though.

Would you like to keep the ticket open to compare 0.24 to 0.26?

pypingou avatar Jan 03 '18 10:01 pypingou

Yes let's do that.

jdavid avatar Jan 03 '18 10:01 jdavid

You are only sorting by time, so there cannot be a guaranteed order between commits with the same timestamp. Any order among them is correct and equivalent.

If you want to do what git does for --date-order, you need to add topo-sorting to the sorting method.

carlosmn avatar Mar 11 '18 13:03 carlosmn

You are only sorting by time, so there cannot be a guaranteed order between commits with the same timestamp. Any order among them is correct and equivalent.

But shouldn't it rely on the parent/child relationship and only use the timestamp when having to deal with multiple parent/child?

pypingou avatar Mar 14 '18 09:03 pypingou

But shouldn't it rely on the parent/child relationship and only use the timestamp when having to deal with multiple parent/child?

No, because you're only asking for it to be time-sorted.

carlosmn avatar Mar 14 '18 09:03 carlosmn

So what would be the proper way to ask?

pypingou avatar Mar 14 '18 09:03 pypingou

At least I can confirm that using

$ cat test_pygit2.py 
#!/usr/bin/env python

import tempfile
import time
import os

import pygit2

path = tempfile.mkdtemp(prefix='pygit2_test_')
repo = pygit2.init_repository(path)

author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')

# Commit #1

# Create a file in that git repo
with open(os.path.join(path, 'sources'), 'w') as stream:
    stream.write('foo\n bar')
repo.index.add('sources')
repo.index.write()

parents = []
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
    'refs/heads/master',  # the name of the reference to update
    author,
    committer,
    'Commit #1',
    # binary string representing the tree object ID
    tree,
    # list of binary strings representing parents of the new commit
    parents,
))
commit1 = repo.revparse_single('HEAD')
time.sleep(0.1)

# Commit #2
author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')

parents = [commit1.oid.hex]

subfolder = os.path.join('folder1', 'folder2')
os.makedirs(os.path.join(path, subfolder))
# Create a file in that git repo
with open(os.path.join(path, subfolder, 'file'), 'w') as stream:
    stream.write('foo\n bar\nbaz')
repo.index.add(os.path.join(subfolder, 'file'))
repo.index.write()

# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
    'refs/heads/master',  # the name of the reference to update
    author,
    committer,
    'Commit #2',
    # binary string representing the tree object ID
    tree,
    # list of binary strings representing parents of the new commit
    parents
))
commit2 = repo.revparse_single('HEAD')
time.sleep(0.1)


# Commit #3
author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')

parents = [commit2.oid.hex]

# Update the sources file
with open(os.path.join(path, 'sources'), 'w') as stream:
    stream.write('foo\n bar\nbaz!')
repo.index.add('sources')
repo.index.write()

# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
    'refs/heads/master',  # the name of the reference to update
    author,
    committer,
    'Commit #3',
    # binary string representing the tree object ID
    tree,
    # list of binary strings representing parents of the new commit
    parents
))
commit3 = repo.revparse_single('HEAD')


print('')
main_walker = repo.walk(commit3.oid.hex, pygit2.GIT_SORT_TIME)
commits_msgs = []
while 1:
    try:
        com = main_walker.next()
        print(com.message)
        print(com.oid.hex, com.commit_time)
    except StopIteration:
        break

I get:

$ python test_pygit2.py
d19571bf6f72d6bca1410396f76a54188bdb4624
562a32c819b059412564531e13b1fe21fa4d94c4
33c0e0a1953f412762a2540e0bc883988386fa4f

Commit #3
('33c0e0a1953f412762a2540e0bc883988386fa4f', 1521019848)
Commit #1
('d19571bf6f72d6bca1410396f76a54188bdb4624', 1521019847)
Commit #2
('562a32c819b059412564531e13b1fe21fa4d94c4', 1521019847)

using:

$ rpm -q libgit2 python2-pygit2
libgit2-0.26.0-3.fc27.x86_64
python2-pygit2-0.26.3-1.fc27.x86_64

While I get:

$ python test_pygit2.py
b089520a5e7d33b6150892671e2b16c47b300de8
138e207681079198be28e29435e01b383ce6f5f2
85d6c93832f9680e4b49ba07dc3cb8c1d644bf7d

Commit #3
('85d6c93832f9680e4b49ba07dc3cb8c1d644bf7d', 1521019766L)
Commit #2
('138e207681079198be28e29435e01b383ce6f5f2', 1521019766L)
Commit #1
('b089520a5e7d33b6150892671e2b16c47b300de8', 1521019765L)

when using:

libgit2-0.24.6-2.el7.x86_64
python-pygit2-0.24.2-1.el7.x86_64

pypingou avatar Mar 14 '18 09:03 pypingou

Seeing the commit timestamp I wondered if this could be related to changes in git itself, one host is using: git-1.8.3.1-12.el7_4.x86_64 while the other is using: git-2.14.3-3.fc27.x86_64

pypingou avatar Mar 14 '18 09:03 pypingou

Just a quick note. I seems one host is using python 2 and the other python 3. Could that have influence your test?

buhl avatar Apr 19 '20 21:04 buhl