ansible-collection-nextcloud-admin icon indicating copy to clipboard operation
ansible-collection-nextcloud-admin copied to clipboard

Fix GitubActions Docker issue

Open wiktor2200 opened this issue 1 year ago • 8 comments

wiktor2200 avatar Nov 11 '23 15:11 wiktor2200

Hi! @staticdev @aalaesar I've tried to fix a problem with docker molecule test but I got run out of idea. Would you be able to take a look and see? Maybe you will have some other solutions for this problem?

@geerlingguy Sorry for bothering you, but maybe you have got any idea what could have gone wrong here? Have you ever seen such error in yours Ansible images? https://github.com/nextcloud/ansible-collection-nextcloud-admin/actions/runs/6836211015/job/18590869076?pr=318#step:7:110

  failed: [localhost] (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': 'j801881025068.2108', 'results_file': '/home/runner/.ansible_async/j801881025068.2108', 'changed': True, 'item': {'cgroupns_mode': 'host', 'command': '', 'image': 'docker.io/geerlingguy/docker-debian12-ansible:latest', 'name': 'instance', 'pre_build_image': True, 'privileged': True, 'volumes': ['/sys/fs/cgroup:/sys/fs/cgroup:rw']}, 'ansible_loop_var': 'item'}) => {"ansible_job_id": "j801881025068.2108", "ansible_loop_var": "item", "attempts": 8, "changed": false, "finished": 1, "item": {"ansible_job_id": "j801881025068.2108", "ansible_loop_var": "item", "changed": true, "failed": 0, "finished": 0, "item": {"cgroupns_mode": "host", "command": "", "image": "docker.io/geerlingguy/docker-debian12-ansible:latest", "name": "instance", "pre_build_image": true, "privileged": true, "volumes": ["/sys/fs/cgroup:/sys/fs/cgroup:rw"]}, "results_file": "/home/runner/.ansible_async/j801881025068.2108", "started": 1}, "msg": "Error creating container: 500 Server Error for http+docker://localhost/v1.43/containers/create?name=instance: Internal Server Error (\"symlink /proc/mounts /var/lib/docker/fuse-overlayfs/4441cd54c476cdd29d6f1ded1e93781e3c3929ca7407bbc645bd90b92c4c22e2-init/merged/etc/mtab: file exists\")", "results_file": "/home/runner/.ansible_async/j801881025068.2108", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

wiktor2200 avatar Nov 11 '23 19:11 wiktor2200

Hello @wiktor2200 thank you for taking some time to fix the CI. I've also been trying to fix it on some other branch but with no success. :disappointed: Most of the time I suppose the issue is in our code as the ansible image used is popular and I couldn't find someone with a similar issue. Let see if Jeff Geerling can help us :wink:

Edit: just thought. maybe we are upgrading ansible toot fast with dependabot for us to follow ansibles/molecule changes.

Regards

aalaesar avatar Nov 11 '23 21:11 aalaesar

It looks like the error is:

Error creating container: 500 Server Error for http+docker://localhost/v1.43/containers/create?name=instance: Internal Server Error (\"symlink /proc/mounts /var/lib/docker/fuse-overlayfs/4441cd54c476cdd29d6f1ded1e93781e3c3929ca7407bbc645bd90b92c4c22e2-init/merged/etc/mtab: file exists\")

I've seen similar file mount issues in GitHub Actions sometimes, but haven't in the past few months. Is this only with debian12?

geerlingguy avatar Nov 12 '23 06:11 geerlingguy

Hello Jeff! thanks a lot for involvement, I really appreciate that :)

When we were searching for this issue, there is not many issues found, that's why I asked. It occurs randomly in all of our Molecule tests scenarios (both Debian 11,12 and Ubuntu20.04, 22.04), we define scenarios this way: https://github.com/nextcloud/ansible-collection-nextcloud-admin/blob/20ab659c9d5eeaef1d091d4571bb17623e47edb8/.github/workflows/tests.yml#L20-L23

Then running it with: https://github.com/nextcloud/ansible-collection-nextcloud-admin/blob/20ab659c9d5eeaef1d091d4571bb17623e47edb8/.github/workflows/tests.yml#L51

Molecule itself it defined here: https://github.com/nextcloud/ansible-collection-nextcloud-admin/blob/20ab659c9d5eeaef1d091d4571bb17623e47edb8/molecule/default/molecule.yml#L7-L15

And as it's matrix when once fails, rest are cancelled. In this PR I've tried to clean docker cache (inspired with your old blog post: https://www.jeffgeerling.com/blog/2018/testing-your-ansible-roles-molecule) and then molecule reset when docker system prune didn't help.

wiktor2200 avatar Nov 12 '23 10:11 wiktor2200

Hello there @wiktor2200 found this subject on Linux containers forum that is looking much like our issue. Is there a way to check if our github actions are running on top of of LXD?

aalaesar avatar Nov 16 '23 11:11 aalaesar

hello all, I have been super busy with some other ansible issues, construction (like @geerlingguy =p) and don't really understand why this issue is happening. most my roles are tested against the same images and I don't have such an error. I would say to try using podman instead of docker since I mostly replaced docker for podman now. it is an alternative solution

staticdev avatar Nov 19 '23 20:11 staticdev

@wiktor2200 Thanks for trying it out. I saw some potential issues with current state of the PR on comments.

staticdev avatar Dec 01 '23 17:12 staticdev

Hello there. I noticed that we are not running on this issue anymore now..... somehow the issue disapeared... I'll keep the Pr in draft for now until we are confident the issue is gone for good. Regards

aalaesar avatar Feb 07 '24 08:02 aalaesar

initial issue is gone now and CI has been fixed to work now. closing

aalaesar avatar Sep 12 '24 10:09 aalaesar