playground icon indicating copy to clipboard operation
playground copied to clipboard

Remove internal registry

Open gianarb opened this issue 4 years ago • 23 comments

We are not sure if the internal registry is something we want or if it is just a complication.

We enabled #41. So now the workers have access to the internet. It means that all the registry setup and the certificate to have it secure is not strictly required anymore.

BUT we will have to figure out by ourselves how to get our actions to run in the worker. This is not a problem when the action is public, but for private repositories, it is a bit more complicated and the operating system installation environment (osie or tinkie) will have to give us a way to inject authentication (or we have to make tink-worker good enough to authenticate I suppose)

gianarb avatar Jan 26 '21 14:01 gianarb

Yep private repo/images is something we should think about but I think the way to do it is to force it? I'm in favor of dropping the registry from sandbox.

mmlb avatar Jan 26 '21 14:01 mmlb

I think the way to do it is to force it?

What do you mean?

gianarb avatar Jan 26 '21 14:01 gianarb

Drop the registry so that we can have a discussion about this. Force the discussion.

mmlb avatar Jan 26 '21 14:01 mmlb

I agree with @mmlb, just drop it for now from the sandbox. It is easy to add again later and most people/orgs that need a private repo already have a policy and implementation or know how to set it up.... Just my 2 cents... :-)

stolsma avatar Jan 26 '21 14:01 stolsma

need a private repo already have a policy and implementation or know how to set it up.... Just my 2 cents... :-)

Yeah, the problem is not how you set it up for yourself, but how you can use it in the installation environment if it has passwords or things like that. Anyway, I am always up for removing things 👍 I will leave time for more people to leave a comment, and at some point, I will move forward accordingly

gianarb avatar Jan 26 '21 15:01 gianarb

Yeah, the problem is not how you set it up for yourself, but how you can use it in the installation environment if it has passwords or things like that

If you are running a private repo you always have that problem... :-( But this problem is also related to the encryption of the metadata path from provisioner to local executor. Thats something we need to make easy and save so all secrets can be transported in a save way.

stolsma avatar Jan 26 '21 15:01 stolsma

I think I'd like to keep it for now for consistency with k8s-sandbox. @detiber how do you push freshly built dev images to k8s with tilt? Is the registry required/useful?

mrchrd avatar Jan 26 '21 15:01 mrchrd

Beware i use a private repo registry for Docker images with the Ansible Playbook ...

fransvanberckel avatar Jan 26 '21 16:01 fransvanberckel

@fransvanberckel can you tell me a bit more?! Do you rely on the private registry that sandbox provides or you just provision a registry as part of toolchain ansible-role-tinkerbell ships?

If it's the second no problem, you can keep shipping it. If it's the first removing it from the sandbox will require some work for the ansible role.

gianarb avatar Jan 26 '21 16:01 gianarb

@gianarb That's right, it's more or less the second. For more details, take a look at the role ... https://github.com/fransvanberckel/ansible-role-tinkerbell/tree/master/roles/tinkerbell

fransvanberckel avatar Jan 26 '21 16:01 fransvanberckel

I think removing it is a good idea, because it simplifies things. If you seek to do private images or work in an airgapped environment, there are other ways to do it, like a private Docker registry with credentials or pre-pulling images.

nicklasfrahm avatar Jan 26 '21 17:01 nicklasfrahm

I have started to work at this and the problem is in OSIE (and tinkie @thebsdbox ).

OSIE: https://github.com/tinkerbell/osie/blob/master/apps/workflow-helper.sh#L81 Tinkie: https://github.com/gianarb/tinkie/blob/master/bootkit/main.go#L64

We use the internal registry as a way to select the version of tink-worker we want to run as part of the installation environment.

Currently, it works this way:

  1. Sandbox has a file called ./current_versions.sh it contains the publicly available tink-worker image
  2. It gets moved to .env as part of the ./generate-envrc.sh script
  3. setup.sh pick it up and mirror it to the private registry always tagged as latest
  4. In this way the various OSIEs implementation can pull and run it.

We can make it more general and we can write a new RULE to OSIE:

"OSIE use a expect a kernel command line called tink-worker-image or whatever and it uses its content to run the tink-worker. In this way, we can even override it via metadata"

if we all agree I can open PR against osie and tinkie and sandbox to have all of them inline

(At some point we should write OSIE RULES as documentation, but we don' know them all yet! :P )

gianarb avatar Jan 27 '21 11:01 gianarb

We can easily do this now, default to lates and override in the facilities metadata!

    "facility": {
      "facility_code": "onprem tink-worker-tag=abc123",

How does that sound?

thebsdbox avatar Jan 27 '21 11:01 thebsdbox

@thebsdbox I am not too much into the details, but wouldn't it be better to create a new field in the facility object, because this could break applications, that use the facility_code and build on top of the metadata.

Maybe something like facility.cmdline?

nicklasfrahm avatar Jan 27 '21 11:01 nicklasfrahm

ah so this is an undocumented thing :| things in the facility code are passed to the cmdline. They're then presented as environment variables or in /proc/cmdline inside of OSIE and we can process them anyway we like. It won't break any existing functionality.

thebsdbox avatar Jan 27 '21 11:01 thebsdbox

Yep @thebsdbox I love that. Let's see what @mmlb says and I can proceed when we agree

@nicklasfrahm yeah it is unfriendly and undocumented :) I think at some point we will figure out a better metadata structure for all of that. I think something is part of a proposal @mmlb started but not totally sure https://github.com/tinkerbell/proposals/pull/25/files

gianarb avatar Jan 27 '21 11:01 gianarb

Yep I think this should be done in a metadata fashion but not like that @thebsdbox, boots already supports this in "cacher mode", we have "custom services version" that we use to boot different versions of osie (https://github.com/tinkerbell/boots/blob/master/packet/models_cacher.go#L319). Custom services is meant for this in the legacy EM stack and we should probably make use of that in tinkerbell too I think. I don't like the idea of injecting it through the facility param that is just layering bad UX on top of the bad stuff already :D.

I don't think this should be part of the Instance proposal @gianarb thats the wrong layer to do this I think.

The other option is to just pin it in the osie "package". Which is kind of the other side of doing things. I'm not too keen on this actually.

There are some runtime params that we should have a way to pass down to tink-worker. Its version seems like one, centralized logging config is another. I think maybe hardware.metadata.services is probably a better place? We can do something like:

{
  "tink-worker": {
    "log": {
      "driver": "..."
    },
    "tag": "some-tag",
    "repo": [
      { "url":"...", "user":"someuser", "password":"s3cr3t"}
    ]
  }
}

edited to add: and boots adds them to command line with some transformation (maybe gron like)? some base64 json is also an option but I don't quite like this :D. That does seem like it can get a bit out of hand and we could end up running up against the kernel's cmdline limit. Maybe we just embed grpccurl or similar binary fetch the services data from hegel?

mmlb avatar Jan 27 '21 16:01 mmlb

Let's summarise:

  1. One option is to build the operating system installation environment with the right binary/container. I think we do not have to discuss this scenario because it requires very low coordination. Whoever wants to do it, can do it I understand the benefit (self-contained) but I would like to offer an easy to swap mechanism because I see the benefit of swapping the version from the outside without having to recompile e distribute an OS.
  2. I proposed cmdline because right now it is the best structure and well-known way we have to inject something but I understand what you are saying @mmlb and yes we can get that information from the metadata, tinkie can extend bootkit to fetch values from there, for osie you have to tell me. Metadata are so unstructured and obscure that I try to avoid that as much as I can xD

Let's evaluate option 2. When you describe services do you mean this proposal https://github.com/tinkerbell/proposals/blob/master/proposals/0014/README.md ? Probably not

At this point we have to figure out a structure and where we want to place that configuration (not in cmdline)

gianarb avatar Jan 27 '21 16:01 gianarb

Hmm yeah we can go with option 2 cmdline, maybe actually (append_cmdline ?). We already have the osie section in

network.interfaces[].netboot.osie | OSIE details

where we can add append_cmdline. https://docs.tinkerbell.org/hardware-data/

mmlb avatar Jan 27 '21 17:01 mmlb

I am confused, you say that cmdline were a bad idea right? :D

gianarb avatar Jan 27 '21 17:01 gianarb

The services proposal (workflow services actually) is not what I meant when I mentioned services... naming x)

mmlb avatar Jan 27 '21 17:01 mmlb

cmdline is definitely an option, but feels hacky (though its my favorite of the hacks). Having a proper way to get some hardware data from tink-server (through hegel probably?) is I think the best option, but not as quick :D

mmlb avatar Jan 27 '21 17:01 mmlb

Ok, we (@mmlb) moved our conversation over prv Slack because here it was too hard and we find a common proposal!

We are gonna teach Hegel a new endpoint /worker that will expose the content of hardware.metadata.worker. That field specifies how the worker that OSIE will start looks like.

The format is not yet defied but ideally something like:

{
  "worker": {
    "log": {
      "driver": "..."
    },
    "image": "quay.io/tinkerbell/tink-worker:something",
    ....
  }
}

Unknown: tink-worker today relays on environment variables and I am not sure how and if we can make it work with this proposal.

gianarb avatar Jan 27 '21 17:01 gianarb