community-topics icon indicating copy to clipboard operation
community-topics copied to clipboard

Discuss removing unnecessary files from collections

Open gotmax23 opened this issue 3 years ago • 14 comments

Summary

Many collections install a bunch of unnecessary files. Just look at the number of files the Fedora ansible-collection-kubernetes-core has to remove:

# Exclude some files from being installed
# Last last yaml key is build_ignore: so this works, despite being a bit hacky.
cat << EOF >> galaxy.yml
  - .github
  - .gitignore
  - .yamllint
  - setup.cfg
  - codecov.yml
  - tox.ini
  - Makefile
  - tests
EOF

(The package also remove docs and licenses from the built collection, as we install them under /usr/share/doc and /usr/share/licenses, but that's not relevant here).

@dmsimard and I submitted PRs 1 2 to community.general to remove unnecessary files from that collection, but @felixfontein claimed there were some legal issues with doing so.

I really don't think there are legal issues here. The collection is a built artifact and doesn't (and already isn't given e.g. the missing galaxy.yml) need to be a full representation of the source code. As long as the collection includes a link to the source, it fulfills the GPL obligations.

I propose that the steering committee allow and encourage collections to drop unnecessary files from the built collection artifacts (i.e. include them in build_ignore).

gotmax23 avatar Aug 10 '22 16:08 gotmax23

@felixfontein said:

The legal problem still is (if I remember correctly): the collection tarball acts both as the source and the installable artefact. Excluding files used for the development from it would violate the GPL.

I don't understand why the collection tarball should be both the source and the installable artifact. It's only the artifact, the source is somewhere on Github or Gitlab or (does this still exist?) Sourceforge... or wherever.

As far as I know, GPL isn't violated as long as you provide the source code somehow. You don't have to provide it in the same package. Otherwise, all Linux distributions would violate the GPL because they typically have the compiled kernel and C libraries under GPL in packages without including the source code in the same package.

As @bcoca said:

even including a URI to those files should be sufficient, the files and any changes must be available, but it does not mean every possible form of distribution of the product must have them ... otherwise distributing binary packages would be untenable.

Yep, you have to provide the sources. A URI to them is sufficient.

mariolenz avatar Aug 10 '22 17:08 mariolenz

Explanation for RH Legal

Ansible collections contain modules and other types of Ansible plugins. These plugins are usually written in either Python or Powershell. These collections are built using ansible-galaxy collection build. This command takes the configuration from a galaxy.yml file and creates a tarball containing all of the files (including tests and other development files) in the collection's repository besides galaxy.yml and any files that are manually excluded. The tarball also contains some metadata files that are generated as part of the build process. These collection artifacts are then published to Ansible Galaxy (https://galaxy.ansible.com).

The question we have is whether we can exclude the tests and other development files from the built collection artifact and fulfill our GPL obligations by linking to the actual source code on Github (or to whatever Git forge the project is hosted on). Note that collections already link to their respective source repository, and the community collection guidelines require that all collection releases are tagged in said repositories.

gotmax23 avatar Aug 10 '22 19:08 gotmax23

This might be linked with legal questions regarding the ansible community package. Namely we need to provide the sources of that one as well, and I guess the question is what would count as sources. Right now, we assume that the Galaxy tarballs are the sources of the collections, and thus the ansible tarball is the source distribution of ansible since it contains the contents of the Galaxy tarballs for all collections included.

If the tarballs on Galaxy would not be the sources, we might have to actively engage in collecting the sources of the collections (next to the collections themselves) to be able to provide the sources of the ansible package.

(At least I'm not remembering that this was part of the discussion as well. There might have been more points which I do not remember right now.)

felixfontein avatar Aug 11 '22 05:08 felixfontein

Yet you are not providing the sources to build 'ansible' itself (aka anstibull). In any case, I think this is your choice as packager, other people are free to repackage in other ways (no docs, no tests).

It would not be the first time, ansible has had an 'ansible-doc' package (with the html docs from site) in many distros for a long time.

Another option is to have ansible-galaxy install w/o tests/docs (not extract them from package), but that would be a core feature instead.

bcoca avatar Aug 11 '22 13:08 bcoca

@bcoca the ansible tarball contains a script that will re-build the tarball using antsibull (templated from https://github.com/ansible-community/antsibull/blob/main/src/antsibull/data/build-ansible.sh.j2).

felixfontein avatar Aug 11 '22 14:08 felixfontein

yet you still don't distribute anstibull in it, my point was not to force you to do so, but to show you you don't need to distribute everything in it.

bcoca avatar Aug 11 '22 14:08 bcoca

@bcoca I would be really glad if RH legal would agree with that. That would make our life a lot easier. But for some reason they didn't wanted to do that so far (I don't even know what exactly they were asked, or if they actually read the question(s), so :shrug:).

felixfontein avatar Aug 11 '22 14:08 felixfontein

wasn't that posted above? In any case I would keep the current distribution 'as is' and if someone wants an 'ansible-lean' that moves out docs/tests ... DIY?

bcoca avatar Aug 11 '22 14:08 bcoca

Thanks for bringing up the ansible community package, @felixfontein. I did not think about that when bringing this up, as I thought about it as a separate issue.

This might be linked with legal questions regarding the ansible community package. Namely we need to provide the sources of that one as well, and I guess the question is what would count as sources. Right now, we assume that the Galaxy tarballs are the sources of the collections, and thus the ansible tarball is the source distribution of ansible since it contains the contents of the Galaxy tarballs for all collections included.

Would it be practical to just rebuild the collections in the ansible community package from the source repositories to begin with? All collections are already supposed to tag releases in the source repository[^2]. We could require that all tags exactly match (with an optional leading v) the version from the galaxy.yml.

Either way, it shouldn't be too hard to provide a separate source tarball that includes all of the tests and development files and then remove them from the sdist. So many upstreams do already do this that we have a specific guideline about handling it in the Fedora Python Packaging Guidelines. Currently, ansible uses some deprecated setuptools hackery (it's hackery in my opinion) to remove the unnecessary from the built wheel that is deprecated and still leaves a fair amount of cruft included[^1]. It would be a lot easier if we didn't have to do this.

[^1]: I know, because @dmsimard and I have taken lengths to remove the remaining unnecessary files from the Fedora ansible packages. [^2]: Notably, the fortinet collections don't do this and instead have an inane version per branch structure.

If the tarballs on Galaxy would not be the sources, we might have to actively engage in collecting the sources of the collections (next to the collections themselves) to be able to provide the sources of the ansible package.

This is basically what I said above. Even if RH Legal says we don't have to, I still think we should try to gather the full sources including the tests so upstream and downstreams can run the tests for the whole package. Currently, Fedora does not run the tests in the ansible package (we may in the future), but I know that the Debian package runs some of the unit tests. We run unit tests in the RPM build for our packaged standalone collections where we already get the source from Github and build the collections (i.e. ansible-galaxy collection build and ansible-galaxy collection install) from source.

gotmax23 avatar Aug 11 '22 18:08 gotmax23

One thing about spiting the tests/docs/etc into their own package, this does not work well for collection nor pip formats as both tend to assume that each package owns it's directory and are not friendly to 'sharing' and those files are expected to be within the collection to be functional.

bcoca avatar Aug 15 '22 14:08 bcoca

Not directly related, by related and potentially of interest: #65 and https://github.com/ansible/ansible/pull/78422

felixfontein avatar Aug 16 '22 17:08 felixfontein

If the plans are to extend the ignores in the collection_template, switching to using manifest over build_ignore would be an improvement, even without listing any explicit directives, as the defaults do a pretty good job on their own.

https://docs.ansible.com/ansible-core/devel/dev_guide/developing_collections_distributing.html#manifest-directives

The default manifest directives would already exclude what is in the build_ignore in collection_template. It would also already exclude all of the extra build_ignores referenced at the top of this issue other than tests are included by default.

See also https://github.com/ansible-collections/news-for-maintainers/issues/22

sivel avatar Aug 18 '22 22:08 sivel

It would also already exclude all of the extra build_ignores referenced at the top of this issue other than tests are included by default.

Thanks for the update, @sivel. That's definitely an improvement! I do wish we could figure out the tests issue, but I'm happy to see progress.

gotmax23 avatar Aug 19 '22 00:08 gotmax23

@gotmax23 Please close this issue if done, or open a new forum topic and then close the issue with a pointer to the new discussion: Community-topics: Archiving the repo

mariolenz avatar May 02 '24 15:05 mariolenz

as @gotmax23 is notified, closing, thanks everyone!

Andersson007 avatar May 13 '24 07:05 Andersson007