peridot icon indicating copy to clipboard operation
peridot copied to clipboard

Incorrect behavior in k8s.bash?

Open m10k opened this issue 2 years ago • 3 comments

Describe The Bug

Hello everyone,

I am currently trying to set up peridot on a multi-node kubernetes cluster, but I'm stuck where the instructions say to execute hack/setup_base_internal_services. The output of the command is something like the following.

[...]
parse error: Invalid literal at line 1, column 13
Error from server (NotFound): namespaces "registry-secret" not found
error: no objects passed to apply
Error from server (BadRequest): error when creating "hydra/deploy/public/003-deployment.yaml": Deployment in version "v1" cannot be handled as a Deployment: strict decoding error: unknown field "spec.template.spec.containers[0].ports[0].expose", unknown field "spec.template.spec.containers[0].ports[0].external", unknown field "spec.template.spec.containers[0].ports[1].expose", unknown field "spec.template.spec.containers[0].ports[1].external"
[...]

What caught my eye is that the parse error looks a lot like something jq or yq would print if they parse something that's not JSON or YAML, so I dug a bit deeper into the script. It seems that the output is coming from rules_resf/internal/k8s/k8s.bash, which in turn is executed by the first bazel command, bazel run --platforms @io_bazel_rules_go//go/toolchain:linux_"$ARCH" //hydra/deploy/public:public.apply. The problematic pipe is the following.

COPY_TO_NS=$(echo "{$(cat ${i} | grep "namespace" | head -n 1)}" | jq -r '.namespace' | tr -d '\n')

The value of $i is the path of one of the four YAML files in bazel-bin/hydra/deploy/public, and I'm guessing the call is attempting to parse the namespace from the YAML files. Now, grepping for "namespace" in any of those files will likely return a line like

  namespace: "foobar"

which is not valid JSON, so the jq call could not possibly succeed. I simplified the command and changed it to use yq instead, which seems to solve at least one of the problems (there should also be a cleaner solution that does not need grep).

COPY_TO_NS=$(grep -m 1 "namespace:" "$i" | yq -r '.namespace')

However, even with that line fixed, the script does not succeed because it cannot query a secret from kubectl. The problematic line is the following.

kubectl -n "registry-secret${STABLE_STAGE}" get secret registry -o json | jq ".metadata.namespace=\"${COPY_TO_NS}\"" | kubectl apply --force -f -

This command attempts to fetch the secret called registry from a namespace whose name starts with registry-secret. There is no such namespace in my cluster, and there is no secret called registry in any of the other namespaces either. I have a secret called mlbuild-secret in the default namespace. Maybe the script is supposed to query this secret instead? My username is mlbuild, and there is also a namespace called mlbuild-dev, so this would make sense. On the other hand I can't rule out that the namespaces and secrets in my cluster haven't been set up correctly. Could anybody please shed some light on this?

Thank you!

Reproduction Steps

  1. Set up a kubernetes cluster
  2. Follow the installation instructions until the step where it says to execute hack/setup_base_internal_services

Expected Behavior

The script completes without errors.

Version and Build Information

HEAD is at 8222ab2f43a330bf200017f9f77205983f46de9c

Additional context

No response

m10k avatar Nov 25 '22 06:11 m10k

Hi @m10k - Thank you for the report. The setup process is a bit of a pain point right now, but we're working on porting in some changes we use on another project which allow for a single-command setup of the development environment. We're hoping to merge that change in the next couple of months.

However, for now, let's see if we can get your setup running. I think it is complaining that you don't have a secret for hydra. You can create one as follows:

kubectl -n "$USER-dev" create secret generic server --from-literal=hydra-secret="$(export LC_CTYPE=C; cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)" --from-literal=byc-secret="$(export LC_CTYPE=C; cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)"

NeilHanlon avatar Nov 30 '22 13:11 NeilHanlon

Hey @NeilHanlon, thank you for your response!

I tried running the command that you posted, but unfortunately setup_base_internal_services still fails with the same error.

I noticed that I already have a secret for hydra in my mlbuild-dev namespace, though. To be honest, I don't quite understand what the script does, but I got the feeling that it is moving secrets from one namespace to another. Is it necessary to copy this secret to the default namespace?

I have the following secrets in mlbuild-dev

mlbuild@k8s:~/peridot$ kubectl -n mlbuild-dev get secrets
NAME     TYPE     DATA   AGE
env      Opaque   1      7d17h
hydra    Opaque   2      9d
server   Opaque   2      23h

And these are in the default namespace

mlbuild@k8s:~/peridot$ kubectl get secrets
NAME                               TYPE                                  DATA   AGE
hydra                              Opaque                                2      21h
minio                              Opaque                                3      9d
mlbuild-secret                     kubernetes.io/service-account-token   3      10d
postgres-postgresql                Opaque                                1      9d
sh.helm.release.v1.localstack.v1   helm.sh/release.v1                    1      9d
sh.helm.release.v1.localstack.v2   helm.sh/release.v1                    1      9d
sh.helm.release.v1.minio.v1        helm.sh/release.v1                    1      9d
sh.helm.release.v1.postgres.v1     helm.sh/release.v1                    1      9d
sh.helm.release.v1.temporal.v1     helm.sh/release.v1                    1      9d
temporal-default-store             Opaque                                1      9d
temporal-visibility-store          Opaque                                1      9d

Is there any other information I can provide that might help figure out what's going on?

m10k avatar Dec 02 '22 00:12 m10k

I'll chime in that I'm hitting this as well, attempting to follow the instructions on working with docker-desktop, with latest top of tree peridot git. Running the command that was suggested in https://github.com/rocky-linux/peridot/issues/73#issuecomment-1332189196 and it's seemingly not getting picked up from the bazel public or deploy steps

warthog9 avatar Feb 07 '23 20:02 warthog9