build
build copied to clipboard
Issues in building knative on Power PC, ppc64le architecture
I am trying to build knative on ppc64le and wanted to understand the general requirements for building it (along with all dependent images) from scratch.
I have been referring to the DEVELOPMENT.md and could build the base (build-base) image ko.local/github.com/knative/build/build-base for ppc64le, however looks like some of the other required images like git-init and creds-init are either not being built locally (i.e. are pulled from gcr.io) or are being used using incorrect base images, as they are still showing amd64 as architecture (in docker image inspect).
I could not find any Dockerfiles in the repo that could be directly used to build the other images (apart from build base) and am trying to use ./hack/release.sh --nopublish --skip-tests --notag-release
to get the images.
Is there another way to force all the images under build and serving to be built locally? Is there some setting/configuration that I might be missing?
Any guidance on this would be of great help, thanks in advance!
This is probably my fault because I haven't finished adding proper manifest list support to ko
.
The container images are built using ko
, which defaults to using gcr.io/distroless/base:latest
here.
You can try overriding the base image to use something that works with ppc64le
, but I'm not sure where the problem lies.
There may be another issue here where we build the go binaries. It looks like we don't yet hardcode GOARCH=amd64
, so it may be possible to export GOARCH=ppc64le
to build for the right architecture.
I'd like to eventually fix this by adding proper manifest list support to ko: https://github.com/google/go-containerregistry/issues/333
We might be able to fix this use case with a smaller change in the meantime. Let me know if any of that helps or if I can help with unblocking anything.
Thanks @jonjohnsonjr for all your suggestions. This really helps with the understanding. I will try making the changes, as per your suggestions and see how it goes, will keep you posted.
Update - I was able to build all the docker images for ppc64le locally, after overriding the base image on .ko.yaml in build and serving to use the equivalent ppc64le image (again built locally). I had to additionally update the baseImageOverrides section to use the correct ppc64le image for creds-init and git-init (after pushing it to a local registry).
I have the following ppc64le images now
ko.local/github.com/knative/build/cmd/creds-init
ko.local/github.com/knative/serving/cmd/queue
ko.local/github.com/knative/build/cmd/nop ko.local/github.com/knative/build/cmd/controller ko.local/github.com/knative/serving/cmd/activator ko.local/github.com/knative/serving/cmd/controller
ko.local/github.com/knative/build/build-base
localhost:5000/ko.local/github.com/knative/build/build-base
ko.local/github.com/knative/serving/cmd/autoscaler ko.local/github.com/knative/serving/cmd/webhook
ko.local/github.com/knative/build/cmd/git-init ko.local/github.com/knative/build/cmd/webhook
Next, I am planning to follow the instructions here https://github.com/knative/docs/blob/master/install/Knative-with-any-k8s.md
after updating the release.yaml file locally in an attempt to install knative. I already have the istio pods deployed and up and running...
I was trying to push the ko.local/github.com/knative/* images using docker push to a local registry using docker push, however that is failing with error like
open /var/lib/docker/devicemapper/mnt/07eec79cee35a5ee0b0b9509d626e699db8afb26ab7e6ac934fa810825c16193/rootfs/var/run/ko/HEAD: no such file or directory
Checking on this...
Next I updated the release.yaml file to use local images, and could apply it, however not all the pods are coming up correctly, debugging the issues (could be some differences in the docker images) ...
For some context, that file is a symlink that points to the current git commit so that it can be used in logs (I think?).
For example: https://github.com/knative/build/blob/master/cmd/controller/kodata/HEAD
Was added here: https://github.com/knative/pkg/pull/158
If you're building these images outside of a git repo, I could imagine that failing, but it might also be a platform difference? Not sure if that helps... maybe a clue towards fixing it :)
I am building the images from the git repo, so that does not seem to be the problem, tried using tag v0.2.0 that does not have the changes for HEAD, however got LICENSE related error with that.
Finally, could get this (build and push the images to the local registry) to work on another Power system , so looks like it is not a difference in the platform either, rather something else in the environment, right now, not sure what it could be, at this stage.
Docker version is also not an issue as it works with the same version on Intel as well.
Setting up remote registry to host the image on the system it works, to attempt to move ahead with the deployment ..
@jonjohnsonjr some progress on this today - after setting up remote registry, pushing the knative build/serving docker images there, updating release.yaml in accordance and applying it. this is what we have
kubectl get pods --namespace knative-serving NAME READY STATUS RESTARTS AGE controller-66f94dbf98-rjn2l 1/1 Running 0 39m webhook-568ff6fb94-lmmzl 1/1 Running 0 39m
kubectl get pods --namespace knative-build NAME READY STATUS RESTARTS AGE build-controller-6ddc9d64cb-6znfc 1/1 Running 0 40m build-webhook-859b8599b5-dc275 1/1 Running 0 40m
kubectl get pods --namespace knative-monitoring NAME READY STATUS RESTARTS AGE grafana-7549795fd4-4x7lj 1/1 Running 0 2h kibana-logging-68d7697687-ssbjx 1/1 Running 0 38m kube-state-metrics-7c7b459dfb-5vpp7 3/4 CrashLoopBackOff 12 38m prometheus-system-0 1/1 Running 0 38m prometheus-system-1 1/1 Running 0 38m
Checking on kube-state-metrics, could be due to differences in images for addon-resizer on Intel vs Power Error syncing pod d7882c48-14c3-11e9-83d7-525400891221 ("kube-state-metrics-7f7dd967fc-gc7gs_knative-monitoring(d7882c48-14c3-11e9-83d7-525400891221)"), skipping: failed to "StartContainer" for "addon-resizer" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=addon-resizer pod=kube-state-metrics-7f7dd967fc-gc7gs_knative-monitoring(d7882c48-14c3-11e9-83d7-525400891221)"
Just to clarify -- is the target cluster you're trying to deploy to also PowerPC?
Looks like we're failing to pull this: https://github.com/knative/serving/blob/114ee46c575df605fd38a94f2fe1c32107f30b2d/third_party/config/monitoring/metrics/prometheus/kubernetes/kube-state-metrics.yaml#L154-L155
Indeed it is amd64:
$
crane
config k8s.gcr.io/addon-resizer:1.7 |
jq
.architecture
"amd64"
Under gcr.io/google-containers/addon-resizer-ppc64le
, there's only one image, gcr.io/google-containers/addon-resizer-ppc64le:2.1
, which might work if you replace k8s.gcr.io/addon-resizer:1.7
with k8s.gcr.io/addon-resizer-ppc64le:2.1
, but it's hard to say :man_shrugging:.
Interestingly, that claims to be amd64 as well:
$ crane config gcr.io/google-containers/addon-resizer-ppc64le:2.1 | jq .architecture
"amd64"
That might not matter, but I believe it's a bug in whatever is producing these images :man_facepalming:.
If you don't care about monitoring I think you can just skip it.
cc @mdemirhan any context for where the monitoring yaml comes from? We probably want to be using a newer tag (2.1) and figure out how to make that tag point to a manifest list to support more platforms.
The author doesn't seem to work at google anymore, so it's going to be a bit of a challenge to figure out what produces these images :/
$ crane config gcr.io/google-containers/addon-resizer:2.1 | jq .author -r
Quintin Lee "[email protected]"
Thanks for checking and providing your comments and feedback!!
Yes, the target cluster is PowerPC as well and since most of the images that were being referenced from release.yaml were not having multiarch manifest, I have had to replace those with (equivalent) multiarch images that are hosted here https://cloud.docker.com/u/ibmcom/repository/docker/ibmcom.
For addon-resizer, I was using https://hub.docker.com/r/googlecontainer/addon-resizer-ppc64le/ when we got the above error. This is a ppc64le image.
Debugged today and found this in the logs
kubectl logs kube-state-metrics-595f76d67d-tj6g4 addon-resizer -n knative-monitoring I0111 10:00:25.584493 1 pod_nanny.go:63] Invoked by [/pod_nanny --container=kube-state-metrics --cpu=100m --extra-cpu=1m --memory=100Mi --extra-memory=2Mi --threshold=5 --deployment=kube-state-metrics] unknown flag: --threshold
In release.yaml, commented line:7470 --threshold=5 and the deployment succeeded
kubectl get pods -n knative-monitoring NAME READY STATUS RESTARTS AGE grafana-7549795fd4-p2jnj 1/1 Running 0 3m kibana-logging-68d7697687-gdmmb 1/1 Running 0 3m kube-state-metrics-5c9d7d6499-ln9mq 4/4 Running 0 1m prometheus-system-0 1/1 Running 0 3m prometheus-system-1 1/1 Running 0 3m
So everything seems to be up and running ...
However, running into issues in testing the sample app, debugging those, currently.
Continue to debug the issues in deploying the same hello world app. It seems to apply correctly, however the service is not coming up
kubectl logs controller-66f94dbf98-s9jsx controller -n knative-serving
shows errors like these,
{"level":"warn","ts":"2019-01-14T09:29:58.742Z","logger":"controller.service-controller","caller":"service/service.go:148","msg":"Failed to update service status{error 25 0 services.serving.knative.dev "helloworld-go" is forbidden: User "system:serviceaccount:knative-serving:controller" cannot update services.serving.knative.dev/status in the namespace "default"}","knative.dev/controller":"service-controller","knative.dev/key":"default/helloworld-go"}
@jonjohnsonjr any pointers / feedback / inputs on this? looks like I am missing something in the configuration?
cc @tcnghia might have a better answer
I believe that's because the knative serving controller needs cluster-admin permissions to create resource in other namespaces?
It looks like you're missing this. I think this command would fix it:
kubectl create clusterrolebinding knative-serving-controller-admin --clusterrole=cluster-admin --serviceaccount=knative-serving:controller --namespace=knative-serving
thanks @jonjohnsonjr tried the above, however looks like this was already applied on the system (perhaps while applying release.yaml). Got the below error ... Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "knative-serving-controller-admin" already exists
Still debugging...
@seth-priya can you please check if the knative-serving-admin
role has similar content to this https://github.com/knative/serving/blob/master/config/200-clusterrole.yaml
Also, the role binding exists, but does it grant knative-serving-controller-admin
the knative-serving-admin
role?
If you can share kubectl get clusterrolebindings.rbac.authorization.k8s.io knative-serving-controller-admin -n knative-serving -o yaml
and kubectl get ClusterRole knative-serving-admin -o yaml
that will be really useful.
Hi @tcnghia , I am working with @seth-priya on building knative on ppc64le.
As per your response, the knative-serving-admin role has similar content to this https://github.com/knative/serving/blob/master/config/200-clusterrole.yaml
Please find the output of kubectl get clusterrolebindings.rbac.authorization.k8s.io knative-serving-controller-admin -n knative-serving -o yaml
as knative-serving-controller-admin.txt and kubectl get ClusterRole knative-serving-admin -o yaml
as knative-serving-admin.txt.
@jonjohnsonjr and @tcnghia We moved on to release-0.3 and deployed only the knative-serving component. We could see all 4 pods running.
Tried deploying the sample app but we are facing RevisionFailed issue. Do you have any pointers?
kubectl get all -n default
gives output as output.txt
kubectl describe service.serving.knative.dev/helloworld-go
gives output as service.txt
kubectl describe revision.serving.knative.dev/helloworld-go-00001
gives ouput as revision.txt
That could be an issue with how we're resolving tags to digests...
If you add "gcr.io" to the configmap here: https://github.com/knative/serving/blob/06fae8be6da29137fd55b44557572566ef69f975/config/config-controller.yaml#L30
Does that fix things?
@jonjohnsonjr issue is that we are using locally built Power images, pushed to a local registry, so not sure if that will help? do we have any workaround for that?
Is your network configured to allow pulling from gcr.io/knative-samples/helloworld-go
to work?
@jonjohnsonjr pulling image from gcr.io
or docker.io
was failing so we build the image locally and tagged it with ko.local/junawaneshivani/helloworld-go
. This helped resolve the fetch image issue and helloworld sample app seems to work fine now.
Do we need to add gcr.io
and docker.io
in the config-controller.yaml
file to be able to pull images from them? docker pull
works for gcr.io
and docker.io
but seems to fail only in sample app.
Also, we have moved to building knative eventing and sources, and facing similar ImagePull issues with eventing code samples. Debugging further.
@jonjohnsonjr I was able to run the knative eventing sample, by trying to pull the image by sha rather than the tag. Now we have pods under all 5 namespaces running using locally built ppc64le
images.
Hi @jonjohnsonjr now that we are able to complete the deployment and at least basic validation of all the components on Power, was wondering if you or someone from the community would be able to help in pushing multi-arch docker images for knative and its sample apps to gcr.io, that would work on Power?
Please let me know your thoughts and suggestions on how best to take this forward.
Thank you for all your support and help thus far!!
This works on ICP 3.1.1 on Power as per https://github.com/knative/docs/blob/master/install/Knative-with-ICP.md, only significant changes required were due to use of locally built images for knative components ..
@jonjohnsonjr any thoughts / feedback on the earlier comment?
This is on my plate but I haven't found time to get to it yet. If anyone is interested in helping, I made an issue here: https://github.com/google/go-containerregistry/issues/333
I want this to happen automatically in ko if you set the base image to a manifest list. That requires adding manifest list support to google/go-containerregistry and then consuming that support for ko
.
Sorry for not responding sooner; things have been a bit busy 😅
This works on ICP 3.1.1 on Power as per https://github.com/knative/docs/blob/master/install/Knative-with-ICP.md, only significant changes required were due to use of locally built images for knative components ..
@seth-priya does the images support multi-arch?
@clyang82 - no not at this point, as indicated by @jonjohnsonjr this would need work on the ko side. We have based the deployment on ppc64le specific images as of now.
@seth-priya Thanks for your answer. That means @jonjohnsonjr is working on that. right?
I'm slowly working on this as part of some other work. We need to add support for pulling and pushing manifest lists to go-containerregistry for a variety of reasons, and once that lands it'll be ~easy to update ko
to support building and publishing multi-platform images.
@jonjohnsonjr I see that you are doing some work to support multiarch images here. We have some bandwidth available and can contribute in this aspect. I would like to know how much more work is pending and when can we expect power/multi-arch support in ko and subsequently in knative? Let me know if I can help in anything.
@junawaneshivani I'd like to refactor that implementation to what I described in the PR. I filed an issue upstream to make that easier: https://github.com/google/go-containerregistry/issues/474 -- I'll need to fix that before landing the change in ko
. If somebody could build and test that PR against power, it would validate the approach.
After refactoring and merging https://github.com/google/ko/pull/38, we'll need a multi-platform base image to use for releases. Right now we're using gcr.io/distroless/static as the base, which is amd64/linux specific (at least in the config file[1]). We could turn that into a manifest list with an entry for each platform we care about. We could try to contribute that upstream to distroless, or maintain our own image and push that somewhere.
Once we have an appropriate base image, we would just need to update the release script to point to a separate release config via KO_CONFIG_PATH
.
We currently only test on amd64/linux. I'm not sure if/how we'd test other architectures and operation systems.
[1]:
$ crane manifest gcr.io/distroless/static | jq .
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 458,
"digest": "sha256:a574914f27fd415df3951c7bba405640659ec59bbd1fa56adc08f70dd51c585d"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 654432,
"digest": "sha256:1558143043601a425aa864511da238799b57fcf7d062d47044f6ddd0e04fe99a"
}
]
}
$ crane config gcr.io/distroless/static | jq .
{
"architecture": "amd64",
"author": "Bazel",
"config": {
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
]
},
"created": "1970-01-01T00:00:00Z",
"history": [
{
"author": "Bazel",
"created": "1970-01-01T00:00:00Z",
"created_by": "bazel build ..."
}
],
"os": "linux",
"rootfs": {
"diff_ids": [
"sha256:01092e5921c5543a918d54d9df752ee09a84c912a1d914b7eb37e7152f20b951"
],
"type": "layers"
}
}
Hi @jonjohnsonjr , will work on validating the PR against power. Thank you for your efforts in providing multi-arch support. :smile: