terraform-aws-gitlab-runner
terraform-aws-gitlab-runner copied to clipboard
job error.
Hello,
I just tried this module recently, i managed to create my EC2 instance and automatically register on Gitlab.
The problem is each time i try to run a job (for now it's just a hello world job) it will fail with an error :
Running with gitlab-runner 14.8.3 (16ae0625)
on uat-runner okQssjvP
Preparing the "docker+machine" executor
00:35
Using Docker executor with image docker:18.03.1-ce ...
Pulling docker image docker:18.03.1-ce ...
Using docker image sha[2](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/13#L2)56:7c1527e8e59b80ed4[3](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/13#L3)f6c[4](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/13#L4)2[5](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/13#L5)c03cd9cb4[6](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/13#L6)b[8](https://gitlab.dev.woorton.lol/gitlab-instance-930d8b8f/conan-test/-/jobs/13#L8)73d7db02d0c47db01b5f58839c8d for docker:18.03.1-ce with digest docker@sha256:bdeaddc74da33d02b2e7064e9050cd1aaa43c472341688cb1402a027f3f5efa7 ...
Preparing environment
00:17
ERROR: Job failed: prepare environment: exit code 255. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
OR
Running with gitlab-runner 14.8.3 (16ae0625)
on uat-runner okQssjvP
Preparing the "docker+machine" executor
01:[2](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/12#L2)9
Using Docker executor with image docker:18.0[3](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/12#L3).1-ce ...
ERROR: Job failed: adding cache volume: set volume permissions: running permission container "[4](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/12#L4)0f2ea08d785[6](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/12#L6)8454607935d8a5a4cd9936fc0c7ce5a2704f54a1dda5de73513" for volume "runner-okqssjvp-project-2-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70": waiting for permission container to finish: exit code 255
OR
Running with gitlab-runner 14.8.3 (16ae0625)
on uat-runner okQssjvP
Preparing the "docker+machine" executor
01:47
Using Docker executor with image docker:18.03.1-ce ...
Pulling docker image docker:18.03.1-ce ...
Using docker image sha[2](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/7#L2)56:7c1527e8e59b80ed4[3](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/7#L3)f6c[4](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/7#L4)2[5](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/7#L5)c03cd9cb4[6](https://gitlab.domain.com/gitlab-instance-930d8b8f/conan-test/-/jobs/7#L6)b873d7db02d0c47db01b5f58839c8d for docker:18.03.1-ce with digest docker@sha256:bdeaddc74da33d02b2e7064e9050cd1aaa43c472341688cb1402a027f3f5efa7 ...
Preparing environment
00:16
Getting source from Git repository
00:16
Executing "step_script" stage of the job script
00:01
Using docker image sha256:7c1527e8e59b80ed43f6c425c03cd9cb46b873d7db02d0c47db01b5f58839c8d for docker:18.03.1-ce with digest docker@sha256:bdeaddc74da33d02b2e7064e9050cd1aaa43c472341688cb1402a027f3f5efa7 ...
Cleaning up project directory and file based variables
00:16
ERROR: Job failed: exit code 139
here is the config i use :
aws_region = local.common_vars.inputs.region
environment = local.common_vars.inputs.environment
name = "${local.common_vars.inputs.projet}-${replace(local.common_vars.inputs.region,"-","")}-${local.common_vars.inputs.environment}-runner"
key_name = local.common_vars.inputs.key
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids_gitlab_runner = dependency.vpc.outputs.public_subnets
subnet_id_runners = element(dependency.vpc.outputs.public_subnets, 0)
runners_name = "${local.common_vars.inputs.projet}-${replace(local.common_vars.inputs.region,"-","")}-${local.common_vars.inputs.environment}-runner"
runners_gitlab_url = "https://${dependency.gitlab.outputs.hostname}"
gitlab_runner_registration_config = {
registration_token = "toeken"
tag_list = "docker"
description = "runner ec2 default"
locked_to_project = "true"
run_untagged = "false"
maximum_timeout = "3600"
}
Do you have any idea what could be the reason of those errors ? or where could i look for logs ?
Would be nice to see the Terraform plan and the definition of the hello world job.
gitlab.domain.com
looks weired in the logs.
This seems to be some incompatibility between the latest versions of Docker, Docker Machine, Ubuntu (maybe) and GitLab Runner.
Details here: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/26564
This fix seems to have fixed it for me: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/26564#note_977593368
Hey @JulianCBC
It seems a kernel bug. That's the logs from my docker-machine host created by the agent:
[Thu Jun 9 01:01:05 2022] Initializing XFRM netlink socket
[Thu Jun 9 01:01:12 2022] docker0: port 1(veth8a9a81d) entered blocking state
[Thu Jun 9 01:01:12 2022] docker0: port 1(veth8a9a81d) entered disabled state
[Thu Jun 9 01:01:12 2022] device veth8a9a81d entered promiscuous mode
[Thu Jun 9 01:01:13 2022] eth0: renamed from veth5d0d453
[Thu Jun 9 01:01:13 2022] IPv6: ADDRCONF(NETDEV_CHANGE): veth8a9a81d: link becomes ready
[Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered blocking state
[Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered forwarding state
[Thu Jun 9 01:01:13 2022] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
[Thu Jun 9 01:01:13 2022] ------------[ cut here ]------------
[Thu Jun 9 01:01:13 2022] kernel BUG at include/linux/fs.h:3104!
[Thu Jun 9 01:01:13 2022] invalid opcode: 0000 [#1] SMP NOPTI
[Thu Jun 9 01:01:13 2022] CPU: 1 PID: 929 Comm: gitlab-runner-h Not tainted 5.13.0-1028-aws #31~20.04.1-Ubuntu
[Thu Jun 9 01:01:13 2022] Hardware name: Amazon EC2 t3a.medium/, BIOS 1.0 10/16/2017
[Thu Jun 9 01:01:13 2022] RIP: 0010:__fput+0x247/0x250
[Thu Jun 9 01:01:13 2022] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 8802 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48
[Thu Jun 9 01:01:13 2022] RSP: 0018:ffffb7f180bebe30 EFLAGS: 00010246
[Thu Jun 9 01:01:13 2022] RAX: 0000000000000000 RBX: 00000000000a801d RCX: ffffa0954341c000
[Thu Jun 9 01:01:13 2022] RDX: ffffa09545973280 RSI: 0000000000000001 RDI: 0000000000000000
[Thu Jun 9 01:01:13 2022] RBP: ffffb7f180bebe58 R08: ffffa09545b5eb40 R09: ffffa0954fbc0570
[Thu Jun 9 01:01:13 2022] R10: ffffb7f180bebe30 R11: ffffa0954fdea510 R12: ffffa0954fdea500
[Thu Jun 9 01:01:13 2022] R13: ffffa0954fbc0570 R14: ffffa095459732a0 R15: ffffa0955f481900
[Thu Jun 9 01:01:13 2022] FS: 0000000000000000(0000) GS:ffffa09578d00000(0000) knlGS:0000000000000000
[Thu Jun 9 01:01:13 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Jun 9 01:01:13 2022] CR2: 00007ffdea280f79 CR3: 000000011fd7a000 CR4: 00000000003506e0
[Thu Jun 9 01:01:13 2022] Call Trace:
[Thu Jun 9 01:01:13 2022] <TASK>
[Thu Jun 9 01:01:13 2022] ____fput+0xe/0x10
[Thu Jun 9 01:01:13 2022] task_work_run+0x70/0xb0
[Thu Jun 9 01:01:13 2022] exit_to_user_mode_prepare+0x1b5/0x1c0
[Thu Jun 9 01:01:13 2022] syscall_exit_to_user_mode+0x27/0x50
[Thu Jun 9 01:01:13 2022] do_syscall_64+0x6e/0xb0
[Thu Jun 9 01:01:13 2022] ? do_syscall_64+0x6e/0xb0
[Thu Jun 9 01:01:13 2022] ? irqentry_exit_to_user_mode+0x9/0x20
[Thu Jun 9 01:01:13 2022] ? irqentry_exit+0x19/0x30
[Thu Jun 9 01:01:13 2022] ? sysvec_reschedule_ipi+0x7e/0xf0
[Thu Jun 9 01:01:13 2022] ? asm_sysvec_reschedule_ipi+0xa/0x20
[Thu Jun 9 01:01:13 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae
[Thu Jun 9 01:01:13 2022] RIP: 0033:0x466100
[Thu Jun 9 01:01:13 2022] Code: Unable to access opcode bytes at RIP 0x4660d6.
[Thu Jun 9 01:01:13 2022] RSP: 002b:00007ffdea280dd0 EFLAGS: 00000200 ORIG_RAX: 000000000000003b
[Thu Jun 9 01:01:13 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Thu Jun 9 01:01:13 2022] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[Thu Jun 9 01:01:13 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[Thu Jun 9 01:01:13 2022] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[Thu Jun 9 01:01:13 2022] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Thu Jun 9 01:01:13 2022] </TASK>
[Thu Jun 9 01:01:13 2022] Modules linked in: veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse input_leds aesni_intel crypto_simd cryptd serio_raw ena parport_pc parport sch_fq_codel ipmi_devintf ipmi_msghandler msr drm ip_tables x_tables autofs4
[Thu Jun 9 01:01:13 2022] ---[ end trace 87ca6f1d500d57c3 ]---
[Thu Jun 9 01:01:13 2022] RIP: 0010:__fput+0x247/0x250
[Thu Jun 9 01:01:13 2022] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 8802 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48
[Thu Jun 9 01:01:13 2022] RSP: 0018:ffffb7f180bebe30 EFLAGS: 00010246
[Thu Jun 9 01:01:13 2022] RAX: 0000000000000000 RBX: 00000000000a801d RCX: ffffa0954341c000
[Thu Jun 9 01:01:13 2022] RDX: ffffa09545973280 RSI: 0000000000000001 RDI: 0000000000000000
[Thu Jun 9 01:01:13 2022] RBP: ffffb7f180bebe58 R08: ffffa09545b5eb40 R09: ffffa0954fbc0570
[Thu Jun 9 01:01:13 2022] R10: ffffb7f180bebe30 R11: ffffa0954fdea510 R12: ffffa0954fdea500
[Thu Jun 9 01:01:13 2022] R13: ffffa0954fbc0570 R14: ffffa095459732a0 R15: ffffa0955f481900
[Thu Jun 9 01:01:13 2022] FS: 0000000000000000(0000) GS:ffffa09578d00000(0000) knlGS:0000000000000000
[Thu Jun 9 01:01:13 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Jun 9 01:01:13 2022] CR2: 00000000004660d6 CR3: 000000011fd7a000 CR4: 00000000003506e0
[Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered disabled state
[Thu Jun 9 01:01:13 2022] veth5d0d453: renamed from eth0
[Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered disabled state
[Thu Jun 9 01:01:13 2022] device veth8a9a81d left promiscuous mode
[Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered disabled state
[Thu Jun 9 01:01:19 2022] loop5: detected capacity change from 0 to 8
Kernel and Docker version:
$ uname -romi
5.13.0-1028-aws x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 20.10.17, build 100c701
In my tests, the runner agent couldn't connect properly to the docker-machine host when a Job was running:
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
runner-xqnnexbf-runner-1654735954-094efa94 - amazonec2 Running tcp://10.48.102.185:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
sh-4.2$ sudo docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
runner-xqnnexbf-runner-1654735954-094efa94 - amazonec2 Running tcp://10.48.102.185:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
Once the Job ends, it could connect without issues:
sh-4.2$ sudo docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
runner-xqnnexbf-runner-1654735954-094efa94 - amazonec2 Running tcp://10.48.102.185:2376 v20.10.17
So before viewing your reply, I changed the ubuntu version to 22.04:
runner_ami_filter = {
- name = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
+ name = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
It worked as well 👍🏼
Thanks @hiago-miguel that's very illuminating!
In the interests of adding to my knowledge on how to manage this, where did you get those kernel logs?
@kayman-mk @npalm should we just update the AMI used by the instances Docker Machine spins up to 22.04? Is there any specific reason why we're still using 20.04? (I do note that the default AMIs used by GitLab's Docker Machine fork are 20.04)
@JulianCBC the Kernel logs are from the docker machine host created by the runner agent.
I accessed the host using the aws ssm. This can be done on the AWS console:
- Select the ec2 instance, then click on Action, and then Connect
Would be nice to see the Terraform plan and the definition of the hello world job.
gitlab.domain.com
looks weired in the logs.
I just changed it to hide my real domain. In the unchanged logs i can see my real domain
Hey @JulianCBC
It seems a kernel bug. That's the logs from my docker-machine host created by the agent:
[Thu Jun 9 01:01:05 2022] Initializing XFRM netlink socket [Thu Jun 9 01:01:12 2022] docker0: port 1(veth8a9a81d) entered blocking state [Thu Jun 9 01:01:12 2022] docker0: port 1(veth8a9a81d) entered disabled state [Thu Jun 9 01:01:12 2022] device veth8a9a81d entered promiscuous mode [Thu Jun 9 01:01:13 2022] eth0: renamed from veth5d0d453 [Thu Jun 9 01:01:13 2022] IPv6: ADDRCONF(NETDEV_CHANGE): veth8a9a81d: link becomes ready [Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered blocking state [Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered forwarding state [Thu Jun 9 01:01:13 2022] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready [Thu Jun 9 01:01:13 2022] ------------[ cut here ]------------ [Thu Jun 9 01:01:13 2022] kernel BUG at include/linux/fs.h:3104! [Thu Jun 9 01:01:13 2022] invalid opcode: 0000 [#1] SMP NOPTI [Thu Jun 9 01:01:13 2022] CPU: 1 PID: 929 Comm: gitlab-runner-h Not tainted 5.13.0-1028-aws #31~20.04.1-Ubuntu [Thu Jun 9 01:01:13 2022] Hardware name: Amazon EC2 t3a.medium/, BIOS 1.0 10/16/2017 [Thu Jun 9 01:01:13 2022] RIP: 0010:__fput+0x247/0x250 [Thu Jun 9 01:01:13 2022] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 8802 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48 [Thu Jun 9 01:01:13 2022] RSP: 0018:ffffb7f180bebe30 EFLAGS: 00010246 [Thu Jun 9 01:01:13 2022] RAX: 0000000000000000 RBX: 00000000000a801d RCX: ffffa0954341c000 [Thu Jun 9 01:01:13 2022] RDX: ffffa09545973280 RSI: 0000000000000001 RDI: 0000000000000000 [Thu Jun 9 01:01:13 2022] RBP: ffffb7f180bebe58 R08: ffffa09545b5eb40 R09: ffffa0954fbc0570 [Thu Jun 9 01:01:13 2022] R10: ffffb7f180bebe30 R11: ffffa0954fdea510 R12: ffffa0954fdea500 [Thu Jun 9 01:01:13 2022] R13: ffffa0954fbc0570 R14: ffffa095459732a0 R15: ffffa0955f481900 [Thu Jun 9 01:01:13 2022] FS: 0000000000000000(0000) GS:ffffa09578d00000(0000) knlGS:0000000000000000 [Thu Jun 9 01:01:13 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Thu Jun 9 01:01:13 2022] CR2: 00007ffdea280f79 CR3: 000000011fd7a000 CR4: 00000000003506e0 [Thu Jun 9 01:01:13 2022] Call Trace: [Thu Jun 9 01:01:13 2022] <TASK> [Thu Jun 9 01:01:13 2022] ____fput+0xe/0x10 [Thu Jun 9 01:01:13 2022] task_work_run+0x70/0xb0 [Thu Jun 9 01:01:13 2022] exit_to_user_mode_prepare+0x1b5/0x1c0 [Thu Jun 9 01:01:13 2022] syscall_exit_to_user_mode+0x27/0x50 [Thu Jun 9 01:01:13 2022] do_syscall_64+0x6e/0xb0 [Thu Jun 9 01:01:13 2022] ? do_syscall_64+0x6e/0xb0 [Thu Jun 9 01:01:13 2022] ? irqentry_exit_to_user_mode+0x9/0x20 [Thu Jun 9 01:01:13 2022] ? irqentry_exit+0x19/0x30 [Thu Jun 9 01:01:13 2022] ? sysvec_reschedule_ipi+0x7e/0xf0 [Thu Jun 9 01:01:13 2022] ? asm_sysvec_reschedule_ipi+0xa/0x20 [Thu Jun 9 01:01:13 2022] entry_SYSCALL_64_after_hwframe+0x44/0xae [Thu Jun 9 01:01:13 2022] RIP: 0033:0x466100 [Thu Jun 9 01:01:13 2022] Code: Unable to access opcode bytes at RIP 0x4660d6. [Thu Jun 9 01:01:13 2022] RSP: 002b:00007ffdea280dd0 EFLAGS: 00000200 ORIG_RAX: 000000000000003b [Thu Jun 9 01:01:13 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [Thu Jun 9 01:01:13 2022] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [Thu Jun 9 01:01:13 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [Thu Jun 9 01:01:13 2022] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [Thu Jun 9 01:01:13 2022] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [Thu Jun 9 01:01:13 2022] </TASK> [Thu Jun 9 01:01:13 2022] Modules linked in: veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse input_leds aesni_intel crypto_simd cryptd serio_raw ena parport_pc parport sch_fq_codel ipmi_devintf ipmi_msghandler msr drm ip_tables x_tables autofs4 [Thu Jun 9 01:01:13 2022] ---[ end trace 87ca6f1d500d57c3 ]--- [Thu Jun 9 01:01:13 2022] RIP: 0010:__fput+0x247/0x250 [Thu Jun 9 01:01:13 2022] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 8802 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48 [Thu Jun 9 01:01:13 2022] RSP: 0018:ffffb7f180bebe30 EFLAGS: 00010246 [Thu Jun 9 01:01:13 2022] RAX: 0000000000000000 RBX: 00000000000a801d RCX: ffffa0954341c000 [Thu Jun 9 01:01:13 2022] RDX: ffffa09545973280 RSI: 0000000000000001 RDI: 0000000000000000 [Thu Jun 9 01:01:13 2022] RBP: ffffb7f180bebe58 R08: ffffa09545b5eb40 R09: ffffa0954fbc0570 [Thu Jun 9 01:01:13 2022] R10: ffffb7f180bebe30 R11: ffffa0954fdea510 R12: ffffa0954fdea500 [Thu Jun 9 01:01:13 2022] R13: ffffa0954fbc0570 R14: ffffa095459732a0 R15: ffffa0955f481900 [Thu Jun 9 01:01:13 2022] FS: 0000000000000000(0000) GS:ffffa09578d00000(0000) knlGS:0000000000000000 [Thu Jun 9 01:01:13 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Thu Jun 9 01:01:13 2022] CR2: 00000000004660d6 CR3: 000000011fd7a000 CR4: 00000000003506e0 [Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered disabled state [Thu Jun 9 01:01:13 2022] veth5d0d453: renamed from eth0 [Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered disabled state [Thu Jun 9 01:01:13 2022] device veth8a9a81d left promiscuous mode [Thu Jun 9 01:01:13 2022] docker0: port 1(veth8a9a81d) entered disabled state [Thu Jun 9 01:01:19 2022] loop5: detected capacity change from 0 to 8
Kernel and Docker version:
$ uname -romi 5.13.0-1028-aws x86_64 x86_64 GNU/Linux $ docker --version Docker version 20.10.17, build 100c701
In my tests, the runner agent couldn't connect properly to the docker-machine host when a Job was running:
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS runner-xqnnexbf-runner-1654735954-094efa94 - amazonec2 Running tcp://10.48.102.185:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint sh-4.2$ sudo docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS runner-xqnnexbf-runner-1654735954-094efa94 - amazonec2 Running tcp://10.48.102.185:2376 Unknown Unable to query docker version: Cannot connect to the docker engine endpoint
Once the Job ends, it could connect without issues:
sh-4.2$ sudo docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS runner-xqnnexbf-runner-1654735954-094efa94 - amazonec2 Running tcp://10.48.102.185:2376 v20.10.17
So before viewing your reply, I changed the ubuntu version to 22.04:
runner_ami_filter = { - name = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"] + name = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"] }
It worked as well 👍🏼
I did the change from 20.04 to 22.04 and it's working !
Thanks a lot for your help !
Been facing the same issue and solved it by updating to Ubuntu 22.04. But I don't think this issue should be closed yet since new users will be using 20.04 (default runner_ami_filter
) and seeing the same bug.
Interesting. Just checked my configuration. I am using the defaults and everything works. No problem. I am using version 5.0.2 of the module.
@kayman-mk when was the last time you refreshed your configuration? The AMIs chosen are selected when the configuration is applied, not dynamically as jobs run.
It's also possible that the broken AMIs have been replaced with newer ones that don't have this issue.
Which AMI are you using?
Good point, Julian. The Runner I checked was setup last Friday. AMI is ami-0929b2e28d090f63f. It was created June 14th.
Weird. That's an Amazon Linux 2 AMI, is that what the spot instances the runner spins up are using?
I relaunched yesterday and started getting tls certificate errors with 22.04. I haven't had a chance to investigate deeply, but pinning to an earlier AMI worked:
runner_ami_filter = {
image-id = ["ami-06bbbd4e89b66f400"]
}
I just encountered the same problem with ami-06bbbd4e89b66f400
, which has worked for 6+ months.
It seems to be resolved again with allowing this to pull a newer image:
runner_ami_filter = {
name = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
Why not using the default AMI? Never had problems with that.
I close this issue due to missing feedback.
I do not remember where I read it, but you are strongly advised not to choose an AMI yourself. Stick with the default. There might be problems using different AMIs.