GitPython icon indicating copy to clipboard operation
GitPython copied to clipboard

GitPython parses RFC 2822 dates with timezone incorrectly

Open yanniszark opened this issue 3 years ago • 4 comments

Bug Report

Description

The parse_date function returns a wrong result, when using an RFC 2822 date with timezone.

In [24]: parse_date('Thu, 20 May 2021 13:38:44 +0300')
Out[24]: (1621517924, -10800)

This output is wrong. 1621517924 is supposed to be UTC time (10:38:44). But instead, it is my time (13:38:44)

Source Code Investigation

The function calls are as following:

  1. repo.index.commit takes author_date as input, which is an RFC 2822 string.
  2. repo.index.commit calls Commit.create_from_tree with author_date.
  3. Commit.create_from_tree calls parse_date(author_date) to parse the date into author_time and author_offset.
  4. Commit.create_from_treecalls cls (aka Commit.__init__) passing author_time and author_offset.
  5. Commit.__init__ saves author_time and author_offset in authored_date and author_tz_offset.

Thus, the fault should lie with the parse_date function.

Environment

Tried with versions 2.1.11 and 3.1.14.

yanniszark avatar May 20 '21 16:05 yanniszark

Thanks for the report! With the given example, I could reproduce the issue and validate that GitPython essentially does the opposite. It returns the original time and inverts the offset, so that original_time + offset yields the UTC time.

The respective source is here, and I think it's very old too. I am also very sure that a lot of code depends on this behaviour so fixing it to match the docs will be a breaking change.

Instead of that I propose to change the documentation instead. What do you think?

Byron avatar May 20 '21 23:05 Byron

Hi @Byron,

Thanks a lot for the quick response!

It returns the original time and inverts the offset, so that original_time + offset yields the UTC time.

I am using GitPython to make a commit and specify a date in RFC 2822 (date taken from another commit). The original date (as shown in git log) is: Fri May 21 12:45:30 2021 +0300 The resulting date (as shown in git log) is: Fri May 21 15:45:30 2021 +0300

Thus, when creating a commit with an RFC 2822 date, the resulting date is incorrect, it's not that the timezone has changed to balance the changed time.

The respective source is here, and I think it's very old too. I am also very sure that a lot of code depends on this behaviour so fixing it to match the docs will be a breaking change.

I'm not sure about the utctz_to_altz function and where it is used. However, I think we should fix parse_date to correctly parse RFC 2822 dates, because that is clearly broken. We can do that without changing the helper function utctz_to_altz, if you believe it will break other places.

yanniszark avatar May 21 '21 10:05 yanniszark

To make my case more clear, here is a minimum reproducer:

import os
import git
import tempfile

from pathlib import Path
from email import utils as email_utils

tmp_dir = tempfile.mkdtemp()
print("Repo folder: '%s'" % tmp_dir)
repo = git.Repo.init(tmp_dir)
Path(os.path.join(repo.working_dir, "example_file")).touch()
repo.index.add(["example_file"])
author = git.Actor("user", "[email protected]")
date = "Thu, 20 May 2021 13:38:44 +0300"
print("Input date: '%s'" % date)
repo.index.commit("Wrong date", author=author, committer=author,
                  author_date=date, commit_date=date)
print("Output date: '%s'" %
      email_utils.format_datetime(repo.head.commit.authored_datetime))

Which outputs:

Repo folder: '/tmp/tmp2x6_v43g'
Input date: 'Thu, 20 May 2021 13:38:44 +0300'
Output date: 'Thu, 20 May 2021 16:38:44 +0300'

yanniszark avatar May 21 '21 12:05 yanniszark

Thanks for making the issue easily reproducible. I have taken a look and noticed that the date in the commit arrives correctly, that's a pleasant surprise, and it can be expected that reading the same date both from the newly created commit (by re-reading it or by using the return value of commit(…)).

My initial desire to not fix such a long-standing bug in fear of breakage makes the assumption that a lot of code depends on it. A way to find it would be to fix this issue and see how many new issues show up - in the worst case the release can be yanked to find another way of introducing a fix.

Thanks a lot for your engagement here to help making this change, I do hope you would be willing to help fixing it too. Please note that if you do and no existing test breaks, it's desirable to add a new test for that.

Byron avatar May 22 '21 00:05 Byron