playground
playground copied to clipboard
The tink_worker in terraformed sandbox doesn't get provisioned
Two disclaimers:
- I am still investigating this issue
- I am new to all of this but I did follow the guide and tried to do some basic troubleshooting.
After I reboot the tink_worker for the first time it doesn't get provisioned. (after running terraform apply).
~~My first intuition was a networking issue, especially that I can see a couple of "this doesn't work as it's supposed to" in the terraform file. I'll run a tcpdump on the server on port 67 to check. That said, the network does seem to be set up correctly when I check it on the Equinix Metal portal.~~ It's not a networking issue.
If that's correct, I'm going to tinker in the worker itself, maybe I'm hitting #130? It's a bit odd though, I reran it a couple of times and it consistently didn't work.
Expected Behaviour
Tink-worker connects to the provisioner and one can see the worker under tink workflow events
Current Behaviour
The workflow is stuck in the PENDING state.
Steps to Reproduce (for bugs)
Run the instructions from here
Context
I was just trying to take Tinkerbell for a spin!
Your Environment
Im running it on macOS, I'm using the terraform sandbox with Equinix metal.
Here are the logs from the boots container:
{"level":"info","ts":1650192223.6683865,"caller":"[email protected]/handler.go:105","msg":"","service":"github.com/tinkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"0c:42:a1:97:f6:48","via":"0.0.0.0","iface":"enp2s0f1","xid":"\"3d:45:49:49\"","type":"DHCPDISCOVER","secs":4}
{"level":"info","ts":1650192223.6685946,"caller":"boots/dhcp.go:78","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","pkg":"main","mac":"0c:42:a1:97:f6:48","circuitID":""}
{"level":"info","ts":1650192223.6712575,"caller":"boots/dhcp.go:91","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPDISCOVER","mac":"0c:42:a1:97:f6:48","err":"discover from dhcp message: get hardware by mac from tink: rpc error: code = Unknown desc = SELECT: sql: no rows in result set","errVerbose":"rpc error: code = Unknown desc = SELECT: sql: no rows in result set\nget hardware by mac from tink\ngithub.com/tinkerbell/boots/packet.(*client).DiscoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/packet/endpoints.go:108\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:17\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/home/github/go/pkg/mod/github.com/golang/[email protected]/singleflight/singleflight.go:56\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:19\ngithub.com/tinkerbell/boots/job.CreateFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/job.go:106\nmain.dhcpHandler.serveDHCP\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:89\nmain.dhcpHandler.ServeDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:50\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\t/home/github/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:169\nruntime.goexit\n\t/opt/actions-runner/_work/_tool/go/1.16.3/x64/src/runtime/asm_amd64.s:1371\ndiscover from dhcp message"}
{"level":"info","ts":1650192227.7058403,"caller":"[email protected]/handler.go:105","msg":"","service":"github.com/tinkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"0c:42:a1:97:f6:48","via":"0.0.0.0","iface":"enp2s0f1","xid":"\"3d:45:49:49\"","type":"DHCPDISCOVER","secs":8}
{"level":"info","ts":1650192227.7061045,"caller":"boots/dhcp.go:78","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","pkg":"main","mac":"0c:42:a1:97:f6:48","circuitID":""}
{"level":"info","ts":1650192227.7088065,"caller":"boots/dhcp.go:91","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPDISCOVER","mac":"0c:42:a1:97:f6:48","err":"discover from dhcp message: get hardware by mac from tink: rpc error: code = Unknown desc = SELECT: sql: no rows in result set","errVerbose":"rpc error: code = Unknown desc = SELECT: sql: no rows in result set\nget hardware by mac from tink\ngithub.com/tinkerbell/boots/packet.(*client).DiscoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/packet/endpoints.go:108\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:17\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/home/github/go/pkg/mod/github.com/golang/[email protected]/singleflight/singleflight.go:56\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:19\ngithub.com/tinkerbell/boots/job.CreateFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/job.go:106\nmain.dhcpHandler.serveDHCP\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:89\nmain.dhcpHandler.ServeDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:50\ngithub.com/gammazero/workerpool.startWorker\n\t/home/github/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:218\nruntime.goexit\n\t/opt/actions-runner/_work/_tool/go/1.16.3/x64/src/runtime/asm_amd64.s:1371\ndiscover from dhcp message"}
{"level":"info","ts":1650192235.779676,"caller":"[email protected]/handler.go:105","msg":"","service":"github.com/tinkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"0c:42:a1:97:f6:48","via":"0.0.0.0","iface":"enp2s0f1","xid":"\"3d:45:49:49\"","type":"DHCPDISCOVER","secs":12}
{"level":"info","ts":1650192235.7798727,"caller":"boots/dhcp.go:78","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","pkg":"main","mac":"0c:42:a1:97:f6:48","circuitID":""}
{"level":"info","ts":1650192235.7824285,"caller":"boots/dhcp.go:91","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPDISCOVER","mac":"0c:42:a1:97:f6:48","err":"discover from dhcp message: get hardware by mac from tink: rpc error: code = Unknown desc = SELECT: sql: no rows in result set","errVerbose":"rpc error: code = Unknown desc = SELECT: sql: no rows in result set\nget hardware by mac from tink\ngithub.com/tinkerbell/boots/packet.(*client).DiscoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/packet/endpoints.go:108\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:17\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/home/github/go/pkg/mod/github.com/golang/[email protected]/singleflight/singleflight.go:56\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:19\ngithub.com/tinkerbell/boots/job.CreateFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/job.go:106\nmain.dhcpHandler.serveDHCP\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:89\nmain.dhcpHandler.ServeDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:50\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\t/home/github/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:169\nruntime.goexit\n\t/opt/actions-runner/_work/_tool/go/1.16.3/x64/src/runtime/asm_amd64.s:1371\ndiscover from dhcp message"}
{"level":"info","ts":1650192251.8749926,"caller":"[email protected]/handler.go:105","msg":"","service":"github.com/tinkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"0c:42:a1:97:f6:48","via":"0.0.0.0","iface":"enp2s0f1","xid":"\"3d:45:49:49\"","type":"DHCPDISCOVER","secs":16}
{"level":"info","ts":1650192251.8751898,"caller":"boots/dhcp.go:78","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","pkg":"main","mac":"0c:42:a1:97:f6:48","circuitID":""}
{"level":"info","ts":1650192251.8778894,"caller":"boots/dhcp.go:91","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPDISCOVER","mac":"0c:42:a1:97:f6:48","err":"discover from dhcp message: get hardware by mac from tink: rpc error: code = Unknown desc = SELECT: sql: no rows in result set","errVerbose":"rpc error: code = Unknown desc = SELECT: sql: no rows in result set\nget hardware by mac from tink\ngithub.com/tinkerbell/boots/packet.(*client).DiscoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/packet/endpoints.go:108\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:17\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/home/github/go/pkg/mod/github.com/golang/[email protected]/singleflight/singleflight.go:56\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:19\ngithub.com/tinkerbell/boots/job.CreateFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/job.go:106\nmain.dhcpHandler.serveDHCP\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:89\nmain.dhcpHandler.ServeDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:50\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\t/home/github/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:169\nruntime.goexit\n\t/opt/actions-runner/_work/_tool/go/1.16.3/x64/src/runtime/asm_amd64.s:1371\ndiscover from dhcp message"}
Ok, I'm super confused. I understand where it's coming from:
The hardware spec inserted into the database is hardcoded rather than dynamically generated based on the worker. I believe it's going to work after I insert a proper hardware definition.
It is however, super unclear looking at the docs that I should do that, and honestly, it probably could be terraformed, too? It's too glaring of an oversight to be real; I must've overlooked something in the docs, but not sure what.
This does sound very not right :D, can you retry but using the code from #126 ?
I haven't used it, adding a correct hardware definition did work. I can't see how your PR fixes it though; it doesn't change anything about the hardware definitions, they are not converted into templates (as they should be).
My gut feeling is that somehow someone got it working for them consistently because the MAC seems to be not-so-random. When I created and destroyed the worker multiple times IIRC it got the same MAC.
I haven't used it, adding a correct hardware definition did work. I can't see how your PR fixes it though; it doesn't change anything about the hardware definitions, they are not converted into templates (as they should be).
My gut feeling is that somehow someone got it working for them consistently because the MAC seems to be not-so-random. When I created and destroyed the worker multiple times IIRC it got the same MAC.
I've used the tf setup a bunch on w/e machines EM ends up provisioning so there's no way a MAC stays the same. It gets updated here https://github.com/tinkerbell/sandbox/blob/main/deploy/compose/create-tink-records/create.sh#L20-L28. This happens (in my branch) by way of:
- terraform creates cloud-config userdata and populates the WORKER_MAC using the data from the api (https://github.com/mmlb/tinkerbell-sandbox/blob/terraform-love/deploy/terraform/main.tf#L103)
- which then runs setup.sh which overrides the mac in the .env file https://github.com/mmlb/tinkerbell-sandbox/blob/terraform-love/deploy/terraform/setup.sh#L163 -> https://github.com/mmlb/tinkerbell-sandbox/blob/terraform-love/deploy/terraform/setup.sh#L116-L127
- so that when
docker-compose upis run it will pick up the worker's mac address (https://github.com/mmlb/tinkerbell-sandbox/blob/terraform-love/deploy/compose/docker-compose.yml#L191-L197) and update the hardware description before feeding it into tink https://github.com/mmlb/tinkerbell-sandbox/blob/terraform-love/deploy/compose/create-tink-records/create.sh#L20-L28
Indeed, my bad! Ok, it definitely didn't work on master for some reason. I hope your branch has a fix for it.
@wokalski did #126 fix things for you?
I didnt test it. I made it work with my local tweaks. I hope it does though !
Hello @wokalski , I'm trying to do exactly the same thing: running terraform sandbox from my macOS to spin up Provisioner and Worker with Equinix metal but the workflow stucks in the PENDING state. Pls, can you share your tweaks? TIA.
@CAcquaviva I did make it work but I didn't end up productizing this setup. Tinker bell undergoing a huge transition when it comes to internals. The issue you're hitting is most likely:
- Certs don't match between the registry and the worker (you can see it on the worker machine in /var/log/bootkit if it's the case)
- Or if that step worked out then it most likely doesn't work because the tink worker :latest is not compatible with sandbox. You need to pin a correct (older) version. From a couple of months ago (try 1.5 months ago or so)
If you are thinking about creating a production setup using tinker bell and you have a small network I'd encourage you to take a look at matchbox from Poseidon. I really like the architecture of tinker bell but it's just too much work in progress now in my opinion.
The project has moved on quite a bit since the issue was raised, namely we no longer use the Postgres backend and the tink CLI has been deprecated.
This may still be an issue but its unclear what the next steps are. We'll take an action to validate the Terraform setup separately and raise issues as needed.