packit-service icon indicating copy to clipboard operation
packit-service copied to clipboard

Please respect `SourceN:` in spec files for upstream tarballs in release `copr_build`s

Open martinpitt opened this issue 3 years ago • 21 comments

So far we've used packit mostly for validating upstream PRs. In these cases, one always needs to specify create-archive:, as there is no existing tarball to download from anywhere. So it usually says something like make dist

This is different for release builds, both for propose_upstream: and copr_build: These are triggered on releases, which usually produce an official tarball and attach it as a release artifact (or use GitHub's autogenerated ones). It is quite important that distributions use the exact upstream tarball, for validation purposes, and avoiding subtle breakage from rebuilding them.

However, naïvely using create-archive: at the top does exactly that: It rebuilds the release tarball from the local git checkout instead of downloading it from upstream. As an effect, the official cockpit-ostree-190.1.tar.xz is different from the tarball in the srpm of the packit copr_build.

Today I spent some time pestering @TomasTomecek and doing some experiments, and Tomas pointed out that it's possible to move the create-archive: action to the job: tests:

  - job: tests
    trigger: pull_request
    metadata:
      targets:
      - fedora-35
      - fedora-36
      - fedora-development
      - centos-stream-9-x86_64
    actions:
      # tarball for upstream PRs
      post-upstream-clone: make cockpit-podman.spec
      create-archive: make dist

That then leaves the default action for all the other (release based) builds, like propose_upstream and copr_build. But I tested that, and it seems the implied default is to run

Command: git archive --output cockpit-podman-47.3.tar.gz --prefix cockpit-podman-47.3/ HEAD

This is rather bad! For many projects it will not even work (as it's e.g. missing autoconf.sh or node_modules or any other steps that happen for the official release tarball), but the bigger problem is that for some projects it will work (kinda) -- but then the packages uploaded to Fedora don't use the official upstream release, but some random untested build.

IMHO it would make much more sense to respect the .spec file, and try and download its SourceN: tarballs. As propose-upstream/copr-build etc. are triggered on release, that's not even a race condition -- by the time the release is published, all the release artifacts should exist.

If that fails, I suggest to just fail the release -- it's better to be explicit about it than trying to cobble together some tarball. If you want to keep that, then can it be a fallback if Source0: does not work?

In https://github.com/cockpit-project/cockpit-podman/pull/978 I have a hack for this:

actions:
  post-upstream-clone: make cockpit-podman.spec
  # tarball for releases (copr_build, koji, etc.)
  create-archive:
    - sh -exc "curl -L -O https://github.com/cockpit-project/cockpit-podman/releases/download/${PACKIT_PROJECT_VERSION}/${PACKIT_PROJECT_NAME_VERSION}.tar.xz"
    - sh -exc "ls ${PACKIT_PROJECT_NAME_VERSION}.tar.xz"

But this is dreadful -- it's hard to discover, hard to get right (like, missing curl's -L option or forgetting to wrap the thing in sh -ex), and duplicates information such as the download URL, which makes it harder to release from forks, for example.

martinpitt avatar May 13 '22 12:05 martinpitt

I suggest to just fail the release -- it's better to be explicit about it than trying to cobble together some tarball.

By that I meant that adding create-archive: git archive ... to packit.yml explicitly is making it much more obvious what will happen/what the developer wants. But this is really bad practice, and if someone does that for Fedora (or any other distro) builds for projects which do have official tarballs (almost all) they need to get a talk. :grin:

martinpitt avatar May 13 '22 12:05 martinpitt

It's Friday and my brain works only on 100 MHz. I believe Packit already does this because that's exactly what we do ourselves - perform official archive generation and upload to PyPI in a GHA and then pick it up via propose_downstream. Sorry I did not realize this while we spoke on chat.

https://github.com/packit/packit/blob/main/packit.spec#L11

Source0:        %pypi_source

That should expand to a URL.

I also just verified that this is what Packit does in propose_downstream:

  1. Fetch archive specified in Source0
  2. Upload it to lookaside cache

https://github.com/packit/packit/blob/85cf200a4d2410237ed562ec306f97de7ed588b2/packit/api.py#L971-L973

If that fails, I suggest to just fail the release -- it's better to be explicit about it than trying to cobble together some tarball. If you want to keep that, then can it be a fallback if Source0: does not work?

We instead retry the release a few times.

Again, sorry Pitti I only realized this just now.

Therefore I presume the create-archive action in propose_downstream is no-op. Can someone from my team confirm?

TomasTomecek avatar May 13 '22 13:05 TomasTomecek

For the cockpit projects we started slowly with copr_build -- so perhaps that does something else than propose_downstream? My cited cockpit-ostree example got a rebuilt tarball, but that still has create-archive: on the top level, instead of on the job: tests stanza, so it's possible that it behaves that way (it's a bit difficult to test..).

So if that is true, this issue would ask about using the same behaviour for copr_build.

martinpitt avatar May 13 '22 13:05 martinpitt

What I don't understand: https://github.com/packit/packit/blob/main/.packit.yaml has create-archive: at the top-level as well. Why is that not being used for propose_downstream:?

It really looks like it would. Your official upstream tarball:

❱❱❱ curl -L https://github.com/packit/packit/archive/refs/tags/0.51.0.tar.gz | sha512sum
aa0c8bd4129656fa8244e701803fd527470e1a99d8fca9c6b031425fce42097b8e41fc0194b9b09569f20fcb6762994ac404c7ac83d9c18321a0917df378a463  -

However, dist-git has a different sha, namely 7e551fbf711f4835db..

So it very much looks to me that packit itself has the exact same problem that I'm trying to solve here?

martinpitt avatar May 13 '22 13:05 martinpitt

λ curl -L https://files.pythonhosted.org/packages/63/76/87bda3ca8f4403e128b55c8a319e496175abc71a5f2a7f4b7008d29768de/packitos-0.51.0.tar.gz | sha512sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.3M  100 17.3M    0     0  9186k      0  0:00:01  0:00:01 --:--:-- 9184k
7e551fbf711f4835db0c6276653bdfeb72fbcf3330378d8d2bbc688a4c9691b13c3ab36bcde408f4aab7d9684a441a35afb7ad830c941dfb28392547c215244d  -

It is same as the archive at PyPI.

mfocko avatar May 13 '22 14:05 mfocko

I see, the github tarballs are just the automatically generated ones. Indeed packit 0.51.0's srpm build does create a tarball:

2022-05-13 12:11:19.623 base_git.py       INFO   Using user-defined script for ActionName.create_archive: [['python3', 'setup.py', 'sdist', '--dist-dir', '.'], ['sh', '-c', 'echo packitos-$(python3 setup.py --version).tar.gz']]
2022-05-13 12:11:19.623 commands.py       DEBUG  Command: python3 setup.py sdist --dist-dir .
[...]
2022-05-13 12:11:20.080 logging.py        INFO   Creating tar archive
2022-05-13 12:11:21.001 logging.py        INFO   removing 'packitos-0.51.0' (and everything under it)
2022-05-13 12:11:21.030 commands.py       DEBUG  Command: sh -c echo packitos-$(python3 setup.py --version).tar.gz

So this seems to work because your tarballs happen to be reproducible? We have some efforts of making that happen, but from packit's POV this is pretty much accidental, and nothing enforces this.

martinpitt avatar May 13 '22 14:05 martinpitt

Ahaaa, I missed that. Then your curl approach resolves the issue in the meantime.

In the longterm, is the ask here that propose_downstream does not utilize create-archive and instead fetches the SourceN archive?

Edit: now that I thought about this a bit more, we should also implement checking of the archives if create-archive will be available during propose_downstream. And verify that the upstream archive is matching the release pubkey.

TomasTomecek avatar May 16 '22 16:05 TomasTomecek

is the ask here that propose_downstream does not utilize create-archive and instead fetches the SourceN

That's what should eventually happen, yes. I am less sure if it should just ignore the top-level create-archive:, or have a more sensible default than git archive -- my gut feeling is currently that a pull_request specific explicit create-archive: in project's packit.yaml, plus and an implied default of fetching SourceN is the right thing to do. Ignoring explicit configuration feels weird.

martinpitt avatar May 16 '22 17:05 martinpitt

Thanks, Martin! I nudged Franta to look at this but he's still going through a backlog of info after coming back from vacation :)

We will for sure talk about this and start addressing it. I think we should come up with a comprehensive solution here and not do a quick & dirty fix. A lot of people actually ask us how to do upstream releases properly so this is a good opportunity to come up with best practices and make the releasing convenient with Packit and propose_downstream.

TomasTomecek avatar May 17 '22 11:05 TomasTomecek

@TomasTomecek : Fully agreed, I appreciate! Especially as there's the curl workaround, so this isn't a blocker for anyone.

martinpitt avatar May 17 '22 11:05 martinpitt

@martinpitt Hello, finally there..;)

First things first:

  • The propose-downstream and copr_build behaves differently.
  • Copr builds for all triggers behave from the implementation point mostly the same. (Just default project naming differences, ...) Not saying we can't change that.

For the propose-downstream the archive manipulation works as follows (for reference, the code is here: https://github.com/packit/packit/blob/213c789c5ec496e4471dbdb7279d9d7d61b4ed44/packit/api.py#L937)

  1. Download the upstream archive.
  2. Send it to the lookaside-cache if not present.

For copr_build, there is a create_archive action with the git archive as default.

So. I see basically two approaches:

  • Respect the Source field for copr_build on release and just download what's there. (Ideally, use some retries as we use for propose-downstream to resolve some delays on source generation.)
  • Use a different default for copr_build on release -- download the GitHub release archive but allow redefinition. (So one can still build the archive from scratch if wanted/needed.)

Just a note to Packit itself: Since there is no create_archive for propose_downstream, we can define that action on the top level without any issues: https://packit.dev/docs/actions/#propose-downstream-command

lachmanfrantisek avatar May 17 '22 12:05 lachmanfrantisek

Respect the Source field for copr_build on release and just download what's there. (Ideally, use some retries as we use for propose-downstream to resolve some delays on source generation.)

  • :heavy_plus_sign: pristine source used, no spec-file change, very transparent
  • :heavy_minus_sign: can't be easily redefined (maybe some hook to do that manually if really needed?)

Use a different default for copr_build on release -- download the GitHub release archive but allow redefinition. (So one can still build the archive from scratch if wanted/needed.)

  • :heavy_plus_sign: can be redefined
  • :heavy_plus_sign: old behaviour can be preserved
  • :heavy_minus_sign: spec-file is changed and uses a local file (can be solved)

lachmanfrantisek avatar May 17 '22 12:05 lachmanfrantisek

@lachmanfrantisek : Thanks for the details! It feels to me that releasing to a COPR or to Fedora shouldn't make much difference wrt. tarballs, they are all just slightly different release targets?

If propose_downstream really does download the tarball from the release, then why does the packit 0.51.0 release srpm call python3 setup.py sdist through the create_archive action (defined in .packit.yml)? Is that just some misleading log, i.e. it doesn't actually use that tarball? I can't see any download from github in the log.

I have no strong opinions between your two alternatives. The first one feels more robust against accidental rebuilds (when project maintainers just set a top-level create-archive:), the second one seems cleaner. I guess I looked at this for too long now to give a good opinion, maybe some poll in #packit or twitter?

martinpitt avatar May 17 '22 12:05 martinpitt

If propose_downstream really does download the tarball from the release, then why does the packit 0.51.0 release srpm call python3 setup.py sdist through the create_archive action (defined in .packit.yml)? Is that just some misleading log, i.e. it doesn't actually use that tarball? I can't see any download from github in the log.

The SRPM build log is coming from a copr_build job so it uses that action. (Until we resolve this issue...;) It uses this job config: https://github.com/packit/packit/blob/213c789c5ec496e4471dbdb7279d9d7d61b4ed44/.packit.yaml#L83 and you can see the picked job in the logs:

2022-05-13 12:11:18.666 utils.py          DEBUG  job_config_index: 6
2022-05-13 12:11:18.666 utils.py          DEBUG  Final package (job) config: JobConfig(job=JobType.copr_build, trigger=JobConfigTriggerType.release, config_file_path='.packit.yaml', specfile_path='packit.spec', files_to_sync='[SyncFilesItem(src=['packit.spec'], dest=packit.spec, mkpath=False, delete=False, filters=[]), SyncFilesItem(src=['.packit.yaml'], dest=.packit.yaml, mkpath=False, delete=False, filters=[]), SyncFilesItem(src=['plans/'], dest=plans/, mkpath=False, delete=False, filters=[]), SyncFilesItem(src=['.fmf/'], dest=.fmf/, mkpath=False, delete=False, filters=[])]', dist_git_namespace='rpms', upstream_project_url='https://github.com/packit/packit', upstream_package_name='packitos', downstream_project_url='https://src.fedoraproject.org/rpms/packit.git', downstream_package_name='packit', dist_git_base_url='https://src.fedoraproject.org/', actions='{<ActionName.create_archive: 'create-archive'>: ['python3 setup.py sdist --dist-dir .', "sh -c 'echo packitos-$(python3 setup.py --version).tar.gz'"], <ActionName.get_current_version: 'get-current-version'>: ['python3 setup.py --version'], <ActionName.pre_sync: 'pre-sync'>: ['python3 plans/git_reference.py']}', upstream_ref='None', allowed_gpg_keys='['5DE3E0509C47EA3CF04A42D34AEE18F83AFDEB23']', create_pr='True', sync_changelog='False', create_sync_note='True', spec_source_id='Source0', upstream_tag_template='{version}', patch_generation_ignore_paths='[]',patch_generation_patch_id_digits='4',copy_upstream_release_description='True',sources='[]', merge_pr_in_ci=True, srpm_build_deps=['python3-pip', 'python3-setuptools_scm'], identifier='None', packit_instances=[<Deployment.prod: 'prod'>, <Deployment.stg: 'stg'>], issue_repository='https://github.com/packit/packit', release_suffix='None', targets={'fedora-all': {}, 'epel-8': {}}, timeout=7200, owner=None, project=packit-releases, dist_git_branches=set(), branch=None, scratch=False, list_on_homepage=True, preserve_project=True, additional_packages=[], additional_repos=[], fmf_url=None, fmf_ref=None, use_internal_tf=False, skip_build=False,env={},enable_net=True)

lachmanfrantisek avatar May 17 '22 12:05 lachmanfrantisek

Ah, sorry! Indeed the rawhide propose_upstream log is this one, that looks much better. So retitling accordingly. I'm glad that this is much less of an issue than I feared, sorry for the noise!

martinpitt avatar May 17 '22 13:05 martinpitt

Some work was done on this issue and @lachmanfrantisek promised to go through again and assess what needs to be done to resolve this.

TomasTomecek avatar Jul 26 '22 12:07 TomasTomecek

I have a new proposal combining the two mentioned above (basically what Martin originally requested and a generic way of what Cockpit does currently):

For copr_build on release, we will use create_archive action if specified. If not, don't build anything and download what's present in Specfile. (Basically, changing the default for create_archive to download the SourceN.)

A few things to consider:

  • This will not change anything for users with the create_archive action defined.

  • For users not setting the create_archive action, this will change the behaviour. (!)

  • What about spec-file version? (A very related to packit/packit#1757)

    • By now, we don't change the version for release Copr builds.
    • No-bump works well for upstream-handled spec-files (=spec-file is prepared), bump is required for generated/downloaded spec-files (=spec-file is not prepared).
    • Possible solutions:
      • Introduce a bump_spec_version config option both for copr-build and propose-downstream that will be True for PRs and commits and False for releases.
        • If a user has the spec-file prepared, bump_spec_version=False should be used, if a user needs to update the spec-file, bump_spec_version=True should be used. (This will allow using a dummy value upstream without any manual change required.)
        • We need to be careful, where we get the version from -- if bump_spec_version==True, we can't load a version from spec-file (basically an issue packit/packit#1757) but use upstream_tag_template to get it from the tag.
  • What about non-URL sources? (Related to packit/packit#1665)

    • Require URL or trigger the build action we use currently?
  • What do we want to do if the source is not available?

    • Do the same thing that we do for propose-downstream => retry. But how to retry when we run the action in Copr (what will be the default in the future)? Can we check the source existence before running the action?
  • To be able to replicate this behaviour, we can implement a --no-build/--pristine-source/--download-source option for CLI. (We can also start with this.) Note it's a bit different to using --no-bump CLI option since it will still trigger the local build. @mfocko You were playing with the build CLI lately, does this make sense?

As you can see, we need to be super careful to not break more than we improve..;)

lachmanfrantisek avatar Aug 03 '22 09:08 lachmanfrantisek

What about non-URL sources? What do we want to do if the source is not available?

These would break propose_downstream as well, right? This could break if some project uses packit to only release to COPR (but not Fedora), and does not release proper upstream tarballs either, right? The solution would be to define a create-archive: action for the copr_build/release job then?

Thanks @lachmanfrantisek for taking this on!

martinpitt avatar Aug 03 '22 09:08 martinpitt

Since there is an hack, for now we don't prioritize this one.

majamassarini avatar Jan 12 '23 09:01 majamassarini

We've come across this when regularly checking the least-recently-commented cards.

I haven't seen anyone else asking for this even though this might be really handy, @martinpitt do you have any issue with your workaround? If the ~~Copr~~Cockpit team does not have any issue with this, I will leave this open but we will prioritise other tasks until there is a stronger push.

When thinking about this, we might want to reuse sources dictionary and add a new option to mention we are relying on specifle. (E.g. preserve_specfile_url, or something like that.

lachmanfrantisek avatar Jul 29 '24 07:07 lachmanfrantisek

@lachmanfrantisek no, the workaround has worked fine for the last 2 years. It's very non-obvious to come up with, but once you know it it's easy to maintain. So no urgency from our side. Thanks for checking!

martinpitt avatar Aug 01 '24 08:08 martinpitt