flatpak-external-data-checker icon indicating copy to clipboard operation
flatpak-external-data-checker copied to clipboard

Git checker doesn't get timestamps, resulting in unwanted PRs

Open wjt opened this issue 4 years ago • 10 comments

On a whim I wrote https://github.com/flathub/org.gnome.Boxes.Extension.OsinfoDb/pull/2 to automatically update the database used by GNOME Boxes when it is tagged upstream. When reviewing my own PR I noticed that the date added in the appdata is today's date, rather than the date attached to the tag.

It doesn't seem to be possible to ask git ls-remote to include more commit metadata. I guess the "easiest" thing to do would be to make a shallow clone of the repository at exactly the tag in question, then read the timestamp out of the tag; but it would be nice to avoid having to fetch the commit contents. I don't know Git well enough to know if there is an operation that can do this. (The GitHub API has a way! But that's not useful in cases, like this one, which aren't on GitHub.)

wjt avatar Mar 11 '21 09:03 wjt

Sure it would be a nice feature to have. But I believe it won't make much difference in practice in most cases, because f-e-d-c on Flathub is likely to get the new tag about 1 hour after it was tagged, so the current date is likely to match the tag date anyway.

gasinvein avatar Mar 11 '21 11:03 gasinvein

The name of the branches that f-e-d-c pushes are based on the tree hash, which will change whenever the date changes, so if a PR is not merged within the same UTC day as being opened, it'd open another one because the hash would change.

Maybe theoretical? Maybe acceptable?

wjt avatar Mar 12 '21 06:03 wjt

Didn't think about it. Yeah, that would be not great.

gasinvein avatar Mar 12 '21 08:03 gasinvein

Maybe we should add GitHub and GitLab checkers instead, that would use API? Or even add a special case for git_ls_remote() helper - if the repo is hosted on GitHub or GitLab, use their APIs to get commit timestamp. I really would like to avoid cloning whole repos. This would make checking git sources significantly slower.

gasinvein avatar Mar 12 '21 08:03 gasinvein

That could work. The GitLab API for fetching a tag's metadata is unauthenticated if the repo is public:

$ http GET https://gitlab.gnome.org/api/v4/projects/gnome%2Fgnome-initial-setup/repository/tags/3.38.2
HTTP/1.1 200 OK
Cache-Control: max-age=0, private, must-revalidate
Connection: keep-alive
Content-Length: 1699
Content-Type: application/json
Date: Fri, 12 Mar 2021 08:45:04 GMT
Etag: W/"d7088e5a56582b820054cfdcea731b6d"
Server: nginx
Vary: Origin
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Gitlab-Feature-Category: source_code_management
X-Request-Id: 01F0JSRM9EBF100F4E4E05R8R2
X-Runtime: 0.044975

{
    "commit": {
        "author_email": "[email protected]",
        "author_name": "Will Thompson",
        "authored_date": "2020-11-21T15:08:26.000+00:00",
        "committed_date": "2020-11-21T15:08:26.000+00:00",
        "committer_email": "[email protected]",
        "committer_name": "Will Thompson",
        "created_at": "2020-11-21T15:08:26.000+00:00",
        "id": "8aacbcb5913ef340fc6a78e6806177b23810766b",
        "message": "NEWS for 3.38.2\n",
        "parent_ids": [
            "099f5f52535f22ab0fb63a74440561c506e2188e"
        ],
        "short_id": "8aacbcb5",
        "title": "NEWS for 3.38.2",
        "web_url": "https://gitlab.gnome.org/GNOME/gnome-initial-setup/-/commit/8aacbcb5913ef340fc6a78e6806177b23810766b"
    },
    "message": "Git-EVTag-v0-SHA512: fab136661b6d1de47c8540773309664249fb43e4f301aaabdff7e889fb4c0ff1f9969cc61c892fd719c8439bbf3f786e4c9b1041443ebbc88689d25daa7e8281\n-----BEGIN PGP SIGNATURE-----\n\niQIzBAABCgAdFiEEHmjljPJViIMBZFtWNCLcDXrUgqcFAl+5LbAACgkQNCLcDXrU\ngqdRNA//WyWvz2XPrcLP7pCjwtV3nVVhWwnI1XEtlWUuYbFGQ3zZ2oIRZZcBwdlH\nmmgd8R8glDL1RAt7Pjor9CWSMmjqlK5eoT2IBCvNlMuQcmrTHGUoZ+uMWZKQYFXu\nGYfHkyF/1RwGWX4vDKW8P77grei7T507DQWBOwY1xEDllTlL0qAiAUzOHHTRFjNq\nHlpTahfuILyA5l/c08Wk/rPjhR+O1L3hBpZ8izdPbpd0yEUI8Bj4u/kt45jkVCRZ\nJSEtTRw1MWHbJtbkeW+Uw6IlLoHEAWVLcnFfw8YdLDXanoupzcLExmi8eUFPZMkO\n6br3s/ww3Ha19ZQuJdO1Ai/+kcsUxBIHwmhLPtbwjBx5TcpCggJIkWMfcZvvKWEB\nRlKrT3PWELAMD3eZg9zyZFUxsgPtTHwwD/LQLb7StyOJJ6Wl4O5/jbvKxoBWP+TM\nMJzB7hQlqrfhymhCkKCE7MOZV8fECreCc7BMeJpDC4vTdES4qVYgAlRkjNLa1DDB\n+jW0UDUcoyPdM3GIkb84uP/A5b7yNpGh49o6XMozQUkTWY1MqN5hSS37HZZ3xXPc\nZfhiODNQBnKf5x5ClE3xeLkpaIIt7z7DxfJenza3yFGh6tj5d/m5NXkIT603sY63\nqIEwtZtLVyJYYPh/RgP45CcW6pLY77pMiIv5ErVhfA4tzszFilA=\n=D9Dd\n-----END PGP SIGNATURE-----",
    "name": "3.38.2",
    "protected": false,
    "release": null,
    "target": "b873e05ad7c4c91ae690cdadb8f1779e42d3ee97"
}

GitHub's appears to need authentication but happily the script already knows how to do that.

(Was there a reason for shelling out to the git CLI rather than using a Python library OOI?)

wjt avatar Mar 12 '21 08:03 wjt

(Was there a reason for shelling out to the git CLI rather than using a Python library OOI?)

Dunno. git command was used previously in main.py, so I just continued using it in utils.py.

gasinvein avatar Mar 12 '21 08:03 gasinvein

if a PR is not merged within the same UTC day as being opened, it'd open another one because the hash would change

An alternate approach to prevent this would be to invent an algorithm to uniquely identify the update we made without relying on git commit hash to match.

gasinvein avatar Mar 12 '21 10:03 gasinvein

And yet another idea for workaround - maybe add timestamp-query to JSONChecker? This would allow extracting timestamps from GitHub / GitLab APIs.

gasinvein avatar Mar 12 '21 11:03 gasinvein

if a PR is not merged within the same UTC day as being opened, it'd open another one because the hash would change

An alternate approach to prevent this would be to invent an algorithm to uniquely identify the update we made without relying on git commit hash to match.

We have an in-memory representation of the manifest file(s) so I guess we could just hash those.

And yet another idea for workaround - maybe add timestamp-query to JSONChecker? This would allow extracting timestamps from GitHub / GitLab APIs.

Thanks for adding this in #155. It works nicely. I will leave this ticket open as a wishlist item – it'd be less obscure to have the Git checker do the right thing by itself. Maybe I'll have a crack when I get time.

wjt avatar Mar 24 '21 15:03 wjt

This now also affects tarballs from GitHub, which now don't have persistent timestamps in HTTP Date header.

gasinvein avatar Dec 07 '21 10:12 gasinvein