pygit2
pygit2 copied to clipboard
Broken GIT_SORT_TIME ?
This is a script to reproduce the behavior I'm seeing:
#!/usr/bin/env python
import tempfile
import time
import os
import pygit2
path = tempfile.mkdtemp(prefix='pygit2_test_')
repo = pygit2.init_repository(path)
author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')
# Commit #1
# Create a file in that git repo
with open(os.path.join(path, 'sources'), 'w') as stream:
stream.write('foo\n bar')
repo.index.add('sources')
repo.index.write()
parents = []
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
'refs/heads/master', # the name of the reference to update
author,
committer,
'Commit #1',
# binary string representing the tree object ID
tree,
# list of binary strings representing parents of the new commit
parents,
))
commit1 = repo.revparse_single('HEAD')
time.sleep(0.1)
# Commit #2
parents = [commit1.oid.hex]
subfolder = os.path.join('folder1', 'folder2')
os.makedirs(os.path.join(path, subfolder))
# Create a file in that git repo
with open(os.path.join(path, subfolder, 'file'), 'w') as stream:
stream.write('foo\n bar\nbaz')
repo.index.add(os.path.join(subfolder, 'file'))
repo.index.write()
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
'refs/heads/master', # the name of the reference to update
author,
committer,
'Commit #2',
# binary string representing the tree object ID
tree,
# list of binary strings representing parents of the new commit
parents
))
commit2 = repo.revparse_single('HEAD')
time.sleep(0.1)
# Commit #3
parents = [commit2.oid.hex]
# Update the sources file
with open(os.path.join(path, 'sources'), 'w') as stream:
stream.write('foo\n bar\nbaz!')
repo.index.add('sources')
repo.index.write()
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
'refs/heads/master', # the name of the reference to update
author,
committer,
'Commit #3',
# binary string representing the tree object ID
tree,
# list of binary strings representing parents of the new commit
parents
))
commit3 = repo.revparse_single('HEAD')
print('')
main_walker = repo.walk(commit3.oid.hex, pygit2.GIT_SORT_TIME)
commits_msgs = []
while 1:
try:
com = main_walker.next()
print(com.message)
print(com.oid.hex)
except StopIteration:
break
It gives me the following output:
7335ffcfdb2039a9631ef5c891ec6a12feaeb342
50ea7d4691ee78b60323ac1d4174a6609190fbaf
47e61d2eaa8c5d3d1562475a43aa99c0e7fb9b8a
Commit #3
47e61d2eaa8c5d3d1562475a43aa99c0e7fb9b8a
Commit #1
7335ffcfdb2039a9631ef5c891ec6a12feaeb342
Commit #2
50ea7d4691ee78b60323ac1d4174a6609190fbaf
The first three lines are the commit as they are committed.
The following lines are the commit message and hash as repo.walk(commit3, pygit2.GIT_SORT_TIME)
is returning them.
Something looks broken to me :)
Oh, forgot to add, I'm running:
python2-pygit2-0.26.0-3.fc27.x86_64 libgit2-0.26.0-3.fc27.x86_64
And the issue doesn't occur on:
python-pygit2-0.24.2-1.el7.x86_64 libgit2-0.24.6-2.puiterwijkpatch2.el7.x86_64
Thanks
Gentle ping? I'm really curious to know if others are thing this or if I'm doing something wrong :)
I think that time granularity is 1 second, so if increasing time.sleep
to 1s it should work, but it doesn't. Looking at the repo all commits have the same time.
So I feel like GIT_SORT_TIME
is right, but the commits are created with the wrong time.
Indeed, I even increased the time.sleep
to 2 seconds but if I print print(com.oid.hex, com.commit_time)
line 99, I get:
6ecf8698c6c4cc294b5a9be4b550d10e53afdbc7
2b708aaaf85ef0f7913db61f9c14489f79844b7b
42da3d707d2d69799705174b6fed4bd6bed66aa1
Commit #3
42da3d707d2d69799705174b6fed4bd6bed66aa1 1514211183
Commit #1
6ecf8698c6c4cc294b5a9be4b550d10e53afdbc7 1514211183
Commit #2
2b708aaaf85ef0f7913db61f9c14489f79844b7b 1514211183
The time is part of the signature, it's set when the author and committer are defined. See https://github.com/libgit2/pygit2/blob/master/src/signature.c#L58
Git blame shows this line changed 5 years ago, but that behavior did not exist in:
python-pygit2-0.24.2-1.el7.x86_64 libgit2-0.24.6-2.el7.x86_64
which is from November 2016, odd :s
Ok, so thanks to your input and hint about the commit date/time I was able to find the issue in my code (well tests in this case), turns out my original code was at fault, printing the commit date/time showed it and adding some time.sleep(1)
fixed it.
Thanks for your help!
Do you mean version 0.24 has the same behaviour? Before you stated it did not.
It did not have the same behavior, adding the time.sleep(1) should make that behavior consistent, still need to confirm it though.
Would you like to keep the ticket open to compare 0.24 to 0.26?
Yes let's do that.
You are only sorting by time, so there cannot be a guaranteed order between commits with the same timestamp. Any order among them is correct and equivalent.
If you want to do what git does for --date-order
, you need to add topo-sorting to the sorting method.
You are only sorting by time, so there cannot be a guaranteed order between commits with the same timestamp. Any order among them is correct and equivalent.
But shouldn't it rely on the parent/child relationship and only use the timestamp when having to deal with multiple parent/child?
But shouldn't it rely on the parent/child relationship and only use the timestamp when having to deal with multiple parent/child?
No, because you're only asking for it to be time-sorted.
So what would be the proper way to ask?
At least I can confirm that using
$ cat test_pygit2.py
#!/usr/bin/env python
import tempfile
import time
import os
import pygit2
path = tempfile.mkdtemp(prefix='pygit2_test_')
repo = pygit2.init_repository(path)
author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')
# Commit #1
# Create a file in that git repo
with open(os.path.join(path, 'sources'), 'w') as stream:
stream.write('foo\n bar')
repo.index.add('sources')
repo.index.write()
parents = []
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
'refs/heads/master', # the name of the reference to update
author,
committer,
'Commit #1',
# binary string representing the tree object ID
tree,
# list of binary strings representing parents of the new commit
parents,
))
commit1 = repo.revparse_single('HEAD')
time.sleep(0.1)
# Commit #2
author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')
parents = [commit1.oid.hex]
subfolder = os.path.join('folder1', 'folder2')
os.makedirs(os.path.join(path, subfolder))
# Create a file in that git repo
with open(os.path.join(path, subfolder, 'file'), 'w') as stream:
stream.write('foo\n bar\nbaz')
repo.index.add(os.path.join(subfolder, 'file'))
repo.index.write()
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
'refs/heads/master', # the name of the reference to update
author,
committer,
'Commit #2',
# binary string representing the tree object ID
tree,
# list of binary strings representing parents of the new commit
parents
))
commit2 = repo.revparse_single('HEAD')
time.sleep(0.1)
# Commit #3
author = pygit2.Signature('Alice Author', '[email protected]')
committer = pygit2.Signature('Cecil Committer', '[email protected]')
parents = [commit2.oid.hex]
# Update the sources file
with open(os.path.join(path, 'sources'), 'w') as stream:
stream.write('foo\n bar\nbaz!')
repo.index.add('sources')
repo.index.write()
# Commits the files added
tree = repo.index.write_tree()
print(repo.create_commit(
'refs/heads/master', # the name of the reference to update
author,
committer,
'Commit #3',
# binary string representing the tree object ID
tree,
# list of binary strings representing parents of the new commit
parents
))
commit3 = repo.revparse_single('HEAD')
print('')
main_walker = repo.walk(commit3.oid.hex, pygit2.GIT_SORT_TIME)
commits_msgs = []
while 1:
try:
com = main_walker.next()
print(com.message)
print(com.oid.hex, com.commit_time)
except StopIteration:
break
I get:
$ python test_pygit2.py
d19571bf6f72d6bca1410396f76a54188bdb4624
562a32c819b059412564531e13b1fe21fa4d94c4
33c0e0a1953f412762a2540e0bc883988386fa4f
Commit #3
('33c0e0a1953f412762a2540e0bc883988386fa4f', 1521019848)
Commit #1
('d19571bf6f72d6bca1410396f76a54188bdb4624', 1521019847)
Commit #2
('562a32c819b059412564531e13b1fe21fa4d94c4', 1521019847)
using:
$ rpm -q libgit2 python2-pygit2
libgit2-0.26.0-3.fc27.x86_64
python2-pygit2-0.26.3-1.fc27.x86_64
While I get:
$ python test_pygit2.py
b089520a5e7d33b6150892671e2b16c47b300de8
138e207681079198be28e29435e01b383ce6f5f2
85d6c93832f9680e4b49ba07dc3cb8c1d644bf7d
Commit #3
('85d6c93832f9680e4b49ba07dc3cb8c1d644bf7d', 1521019766L)
Commit #2
('138e207681079198be28e29435e01b383ce6f5f2', 1521019766L)
Commit #1
('b089520a5e7d33b6150892671e2b16c47b300de8', 1521019765L)
when using:
libgit2-0.24.6-2.el7.x86_64
python-pygit2-0.24.2-1.el7.x86_64
Seeing the commit timestamp I wondered if this could be related to changes in git itself, one host is using: git-1.8.3.1-12.el7_4.x86_64
while the other is using: git-2.14.3-3.fc27.x86_64
Just a quick note. I seems one host is using python 2 and the other python 3. Could that have influence your test?