overview Add code for Ansible Galaxy release automation for Collections

SUMMARY

Add documentation for Ansible Galaxy release automation via GitHub Action Workflow

ISSUE TYPE

Docs Pull Request

COMPONENT NAME

ADDITIONAL INFORMATION

May 03 '20 05:05 ericsysmin

One overarching question, is this still gplv3 compliant? Is everything used to build the collection inside the collection tarball (i mean, the .Github directory and the deploy playbook)?

May 03 '20 14:05 abadger

@abadger were not uploading the tar.gz created by GitHub, were uploading the asset made by Ansible Collection build command. But there is a tar.gz you could download just like you currently do on roles, or any release on GitHub. I just made it easier to tag the release the way you want. However currently Ansible build does include molecule, build, and other directories, not in excludes (supported in 2.10).

May 03 '20 15:05 ericsysmin

@abadger were not uploading the tar.gz created by GitHub, were uploading the asset made by Ansible Collection build command. But there is a tar.gz you could download just like you currently do on roles, or any release on GitHub. I just made it easier to tag the release the way you want. However currently Ansible build does include molecule, build, and other directories, not in excludes (supported in 2.10).

<Nod>. Yeah, we have to be careful not to exclude things used to build the asset. Otherwise it no longer counts as source for the gpl.

May 03 '20 19:05 abadger

@abadger were not uploading the tar.gz created by GitHub, were uploading the asset made by Ansible Collection build command. But there is a tar.gz you could download just like you currently do on roles, or any release on GitHub. I just made it easier to tag the release the way you want. However currently Ansible build does include molecule, build, and other directories, not in excludes (supported in 2.10).

. Yeah, we have to be careful not to exclude things used to build the asset. Otherwise it no longer counts as source for the gpl.

Ah, then that's a problem for Ansible Collections in general as they do exclude directories by default.

May 03 '20 20:05 ericsysmin

Yep :-(. I want to make sure we do it right but I'm not sure what to do about other people yet. (It's a problem that attends from Galaxy only hosting the built collections, not a source tarball)

May 03 '20 21:05 abadger

FYI there's already a promising GitHub Action for publishing collections by @artis3n here: https://github.com/artis3n/ansible_galaxy_collection

My main concern with any automation, however, is that right now for any collection where I upload my personal access token, if there's any way someone could sneak in a commit, they would immediately have access to publishing to any other namespace/collection to which I have access—that needs to be an extremely bold and highlighted warning anywhere automation is concerned. See https://github.com/ansible/galaxy/issues/2275 and https://github.com/ansible/galaxy/issues/2276.

(And the implications should be pretty apparent, but this would also mean the GitHub Action itself could be used as an attack vector, and could be used, for example, to publish a new release to any collection in the community namespace if I used that GitHub Action in any of my collection CI.)

May 04 '20 00:05 geerlingguy

@geerlingguy I'm curious if using the token in the command to ansible exposes it on the github action, at least with the ENV var it doesn't expose it to the console I don't believe

May 04 '20 00:05 ericsysmin

This is the tested output so far, and removing the apikey from the environment.

Run ansible-playbook -i 'localhost,' build/galaxy_deploy.yml -e "github_tag=refs/tags/0.0.4" -e "ansible_galaxy_apikey=***"

PLAY [localhost] ***************************************************************

TASK [Ensure that the ansible_galaxy_apikey exists] ****************************
ok: [localhost] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [Template out the galaxy.yml file.] ***************************************
changed: [localhost]
[DEPRECATION WARNING]: Distribution Ubuntu 18.04 on host localhost should use 
/usr/bin/python3, but is using /usr/bin/python for backward compatibility with 
prior Ansible releases. A future Ansible release will default to using the 
discovered platform python for this host. See https://docs.ansible.com/ansible/
2.9/reference_appendices/interpreter_discovery.html for more information. This 
feature will be removed in version 2.12. Deprecation warnings can be disabled 
by setting deprecation_warnings=False in ansible.cfg.

TASK [Build the collection.] ***************************************************
changed: [localhost]

TASK [Publish the collection.] *************************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=4    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

May 04 '20 01:05 ericsysmin

@geerlingguy I'm curious if using the token in the command to ansible exposes it on the github action, at least with the ENV var it doesn't expose it to the console I don't believe

If you store the token in a repo secret, the output of the value is masked anywhere it is present in the logs: https://help.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets#about-encrypted-secrets

The risk with the token, as @geerlingguy points out, is that anyone with write access to the repo could push changes to the workflow file and have access to the secret to use in arbitrary bash. Forks do not get access to secrets in the original's repo, but then the workflow does not run successfully from a fork.

May 04 '20 01:05 artis3n

FWIW, once something like https://github.com/ansible/galaxy/issues/2276 exists I'd be happy to make supported changes to my Collection action that is mentioned by geerlingguy above.

May 04 '20 01:05 artis3n

That sounds almost like a fundamental flaw if you can't trust the people who manage the repo. They should also likely have access to the secret in the Settings -> Secrets right? Since the secret isn't available outside of the repo that also prevents anyone from getting that secret locally. No one should run this workflow locally. It's an instruction specifically for GitHub Actions Workflow which isn't local. However, another section could be dedicated to automating local release actions.

GitHub doesn't provide any way of preventing the exposure on secrets if the person has write access. They don't even have to create a workflow but could create a new workflow that would output that secret anyways. I could also do a workflow that sends it on each and every push if I wanted. That's why any repo should be restricted.

https://github.community/t5/GitHub-Actions/Protecting-github-workflows/td-p/30290

That link also has some information about this.

May 04 '20 02:05 ericsysmin

They should also likely have access to the secret in the Settings -> Secrets right?

Repo owners with "owner" or "admin" privileges can add or edit the values of secrets, but you can't read the secret values once saved. "Write" contributors could not get to that menu.

manage the repo. They should also likely have access to the secret in the Settings -> Secrets right? Since the secret isn't available outside of the repo that also prevents anyone from getting that secret locally

True, but if someone has write access they can add a run task on the workflow to curl the secret to their own server, or something. They could modify the on trigger so the workflow runs on their commit, no PR or release required.

GitHub doesn't provide any way of preventing the exposure on secrets if the person has write access.

Yup. Presumably someone with write access to a repo is trusted. I do think geerlingguy's point:

if there's any way someone could sneak in a commit, they would immediately have access to publishing to any other namespace/collection to which I have access

is important to keep in mind. But, for my collection action (https://github.com/artis3n/ansible_galaxy_collection), I do have users pass their galaxy token via a Secret in their repo. That is the 'right' way to do it, but there is a risk with users with write permission inherent to Workflows that should probably be highlighted if a workflow suggestion is added to the docs.

That sounds almost like a fundamental flaw if you can't trust the people who manage the repo.

Yeah, totally agree.

May 04 '20 03:05 artis3n

Ok, so maybe we should add these as details somewhere in the docs?

May 04 '20 03:05 ericsysmin

That sounds almost like a fundamental flaw if you can't trust the people who manage the repo.

Just being the devil's advocate, when it comes to the possibility of any of the dozens of people maintaining the community collections which are going to be used across most Ansible installations after 2.10 having any of their personal tokens compromised across any of their repositories (if we have them all use their single personal access token for Galaxy across them all)... I'd rather have a belt-and-suspenders approach. One token that has access to everything in Galaxy is a bit risky.

This is probably not the main place for the security discussion, but we should make sure it is highlighted that if anyone were to get access to your token, they could post to any namespace/collection in Galaxy that you maintain.

May 04 '20 15:05 geerlingguy

For information, and community. and ansible. collections (which are hosted in https://github.com/ansible-collections/) will use Zuul to publish to Galaxy.

This means even someone has admin rights on the repo they will not be able to get the Galaxy Token.

May 04 '20 15:05 gundalow

@gundalow is there a doc on how Zuul is used to publish to Galaxy?

May 04 '20 16:05 ericsysmin

@ericsysmin we have a release-ansible-collection-galaxy job, that is available to all project which are using zuul for CI. https://dashboard.zuul.ansible.com/t/ansible/build/f3b10c3ed59c4584a081e245a5900dba/console is an example job run. Much like this code, there is a service account on galaxy, which we use the token for, to do the publish.

Then, when ever a project tags a release (or merges a commit) we build and publish to galaxy.

May 05 '20 19:05 pabelanger

@ericsysmin FYI if you want to explicitly hide something from the log, like secrets, you can also issue a workflow command: https://help.github.com/en/actions/reference/workflow-commands-for-github-actions#masking-a-value-in-log

May 06 '20 11:05 webknjaz

FYI there's already a promising GitHub Action for publishing collections by @artis3n here: https://github.com/artis3n/ansible_galaxy_collection

That action also mixes up two stages that are meant to be separate: build + publish. I'm strongly against promoting such an approach. Ideally, it should be possible to test what's going to be published rather than just source. When build+publish are squashed there's a possibility that you'll be testing not exactly what the user will get. And this flow looks as follows:

test the src.
build a tarball + immediately publish it w/o testing that tarball.

What I advocate for is the following:

Build a tarball and store it as an artifact.
Have a test matrix that downloads that tarball, installs it and tests its contents (this is most likely a separate job or a collection of jobs).
Have a publish step that downloads the very same tarball that's been tested and uploads it to galaxy. This ensures that this step doesn't have a different build that's being tested.

May 06 '20 11:05 webknjaz

@webknjaz that's a great point, and that's a flow I'll look into supporting. Feel free to file an issue if you'd like.

May 06 '20 13:05 artis3n

FYI there's already a promising GitHub Action for publishing collections by @artis3n here: https://github.com/artis3n/ansible_galaxy_collection

That action also mixes up two stages that are meant to be separate: build + publish. I'm strongly against promoting such an approach. Ideally, it should be possible to test what's going to be published rather than just source. When build+publish are squashed there's a possibility that you'll be testing not exactly what the user will get. And this flow looks as follows:

test the src.

build a tarball + immediately publish it w/o testing that tarball.

What I advocate for is the following:

Build a tarball and store it as an artifact.

Have a test matrix that downloads that tarball, installs it and tests its contents (this is most likely a separate job or a collection of jobs).

Have a publish step that downloads the very same tarball that's been tested and uploads it to galaxy. This ensures that this step doesn't have a different build that's being tested.

This is what we do for testing network / security / content collections for ansible team. It works very well, and also advocate for people to use. One addition we are going to add, is an ephemeral galaxy server where we can store the artifact for testing, then promote to galaxy once code is merged.

May 06 '20 13:05 pabelanger

One addition we are going to add, is an ephemeral galaxy server where we can store the artifact for testing, then promote to galaxy once code is merged.

Yeah, ansible/ansible now has "falaxy" for testing. Not sure if it can be used outside, though.

@webknjaz that's a great point, and that's a flow I'll look into supporting. Feel free to file an issue if you'd like.

I don't have anything else to add there so you can copy what I wrote into an issue as you see fit. I'm currently implementing a similar workflow for a Python package distribution on GitHub Actions Workflows so it's slightly different packaging-wise but the you may get some high-level inspiration ideas from https://github.com/ansible/pylibssh/actions?query=workflow%3A%22%F0%9F%8F%97+%F0%9F%93%A6+%26+test+%26+publish%22 (some bits are still in progress tho).

May 06 '20 13:05 webknjaz

FYI there's already a promising GitHub Action for publishing collections by @artis3n here: https://github.com/artis3n/ansible_galaxy_collection

That action also mixes up two stages that are meant to be separate: build + publish. I'm strongly against promoting such an approach. Ideally, it should be possible to test what's going to be published rather than just source. When build+publish are squashed there's a possibility that you'll be testing not exactly what the user will get. And this flow looks as follows:

test the src.

build a tarball + immediately publish it w/o testing that tarball.

What I advocate for is the following:

Build a tarball and store it as an artifact.

Have a test matrix that downloads that tarball, installs it and tests its contents (this is most likely a separate job or a collection of jobs).

Have a publish step that downloads the very same tarball that's been tested and uploads it to galaxy. This ensures that this step doesn't have a different build that's being tested.

Why do you have to be so correct....lol Can you make changes to this PR with that order?

May 06 '20 14:05 ericsysmin

No, I don't have powers in this repo :)

May 06 '20 14:05 webknjaz

webknjaz, 😑 you can suggest edits in review, just like you did 10 times earlier in this PR lol

May 06 '20 14:05 ericsysmin

@ericsysmin you can find some inspiration on how to work with upload/download artifact @ https://github.com/ansible-community/collection_migration/blob/master/.github/workflows/collection-migration-tests.yml#L136-L188. Except that both now have v2 released and the download one can fetch all the artifacts, not just one by name.

May 06 '20 14:05 webknjaz

Build a tarball and store it as an artifact.

Have a test matrix that downloads that tarball, installs it and tests its contents (this is most likely a separate job or a collection of jobs).

Have a publish step that downloads the very same tarball that's been tested and uploads it to galaxy. This ensures that this step doesn't have a different build that's being tested.

I have a feeling we're talking about vastly different use cases at this point. For a certified content collection with Ansible modules that are organized by an enterprise vendor with a team of developers able to support it, having tooling in place for multiple layers of build, test, and deploy processes may be a worthy goal.

But I think both @ericsysmin and myself are coming at this from the approach of long-time Galaxy role authors.

We are used to having a lightweight CI system, a lightweight release system, and a lightweight package manager. We're coming from the angle of someone more used to systems like NPM, Packagist, or Rubygems, not from the angle of building and distributing RPMs, .debs, etc.

I think theres a major conflict in goals between the two use cases for collections. I just want there to be some easy way for me to maintain collections like I did roles, especially if roles are going to vanish in Galaxy next-gen... because if it's an onerous process (like it is today), I know I'm not going to do it.

I don't want to have to tie two or three different backend systems/processes into each collection I maintain, and I'm probably more willing to bend on this topic than most people who have contributed roles in the past...

May 06 '20 14:05 geerlingguy

@geerlingguy it seems maybe a middle ground is that we do a build first, then do the install instead of doing this

before_script:
  - cd ../
  - mkdir -p ansible_collections/$COLLECTION_NAMESPACE
  - mv ansible-collection-$COLLECTION_NAME ansible_collections/$COLLECTION_NAMESPACE/$COLLECTION_NAME
  - cd ansible_collections/$COLLECTION_NAMESPACE/$COLLECTION_NAME

May 06 '20 14:05 ericsysmin

so really we'd be doing

before_script:
  - ansible-galaxy collection build
  - ansible-galaxy collection install my_namespace-my_collection-1.0.0.tar.gz

May 06 '20 14:05 ericsysmin

@geerlingguy yeah, I was this approach a lot in the past and even now in the Python ecosystem, with publishing to PyPI and have it simplified like this in a lot of places. One reason for this was that the public CIs didn't have a mature artifact and stage pipelining support. I feel like with GHA it became so much easier enabling virtually anyone can allow themselves a more sophisticated approach, especially if there'd be some pre-baked template that anybody could use. Of course, some people may still want to opt out. And that's okay for them. OTOH we should show a more complete approach. Folks can then just remove the steps they dislike.

May 06 '20 14:05 webknjaz