postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Postgres Operators fails to start on Minikube 1.26.0 with qemu2 driver on ARM64

Open mprimeaux opened this issue 2 years ago β€’ 4 comments

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? Kubernetes / Minikube 1.26.0 with qemu2 driver / Apple M1 Ultra
  • Are you running Postgres Operator in production? Yes
  • Type of issue? Bug report

Some general remarks when posting a bug report:

I'm using Minikube 1.26.0 with the qemu2 driver on Apple M1 silicon and the operator fails with the following error:

postgres-operator exec /postgres-operator: exec format error

Using the latest PostgreSQL operator (1.8.2) works as expected on this same version of Minikube on Apple M1 silicon using the docker driver.

Unfortunately, the pod immediately terminates so I've been unable to gather any log files. Does the postgres-operator support ARM64?

mprimeaux avatar Jun 23 '22 15:06 mprimeaux

Seems like official builds are only for amd64: https://github.com/zalando/postgres-operator/blob/1c80ac0acd4fb15432e46d8dadac6f1bf4817d31/Makefile#L57-L59

weisdd avatar Jul 13 '22 09:07 weisdd

@weisdd Thanks for the link.

FWIW, this operator works as expected on my Apple M1 using Docker Desktop (Apple Silicon) as my driver for Minikube but does not work on the same machine with the only difference being the qemu2 driver for Minikube.

I'll dig into it a bit more but perhaps Rosetta2 is running it as AMD64 even though in an ARM64 VM.

mprimeaux avatar Jul 18 '22 01:07 mprimeaux

@mprimeaux building the operator on an aarch64 (linux/arm64) machine (Google Cloud Tau T2A GCE Instance) worked out for me, i.e. customizing the Makefile+Dockerfile and overriding the operator's default image (helm chart values). Additionally, one has to use the custom, arm64 compatible, spilo image which is already available in the Zalando registry. Will test this tomorrow/next week on an Apple Silicon M1 processor. If the PoC works well-enough, I'll file a pull request.

mmoscher avatar Oct 04 '22 12:10 mmoscher

@mmoscher Thanks much! Please let me know if I can test the PR. Happy to help.

mprimeaux avatar Oct 06 '22 15:10 mprimeaux

@mmoscher Any updates on the arm64 support? Please let me know how (or if) I can help. I'll make time.

mprimeaux avatar Oct 23 '22 17:10 mprimeaux

@mprimeaux spilo linux/arm64 support has been merged yesterday https://github.com/zalando/spilo/pull/790 and will be available with the next spilo tag (postgresql version >= 14 support only).

Now we can continue with the operator itself to get it linux/arm64 compatible. However, its baseimage registry.opensource.zalan.do/library/alpine-3.xx, is not yet available with linuxarm64 architecture in the zalando registry.

As mentioned in #2084 two options feasible. The second option, eg. hosting on ghcr.io, would be my favorite one to go with. Nevertheless, I'd no time yet to implement it. Maybe I've some free time at the end of the week/weekend.

For now, you can build it your self with some small changes: https://github.com/mmoscher/postgres-operator/pull/1/files

TL;DR: I'm still on it ;)

mmoscher avatar Oct 25 '22 11:10 mmoscher

I have tried building postgres-operator making those changes

make deps
export TAG=$(git describe --tags --always --dirty)
make docker

But I am getting some errors

at make deps

GO111MODULE=on go mod tidy
github.com/zalando/postgres-operator/pkg/cluster imports
	k8s.io/client-go/rest imports
	k8s.io/client-go/plugin/pkg/client/auth/exec imports
	io/fs: malformed module path "io/fs": missing dot in first path element
make: *** [Makefile:90: tools] Error 1

at make docker

go: extracting github.com/emicklei/go-restful v2.9.5+incompatible
github.com/zalando/postgres-operator/pkg/cluster imports
	k8s.io/client-go/rest imports
	k8s.io/client-go/plugin/pkg/client/auth/exec imports
	io/fs: malformed module path "io/fs": missing dot in first path element
make: *** [Makefile:90: tools] Error 1
echo '{\n "url": "git:https://github.com/zalando/postgres-operator.git",\n "revision": "c895e8f6",\n "author": "root",\n "status": " M Makefile  M docker/Dockerfile  M go.mod"\n}' > scm-source.json
GOOS=linux GOARCH=arm64 CGO_ENABLED=0 go build -o build/linux/postgres-operator -v -ldflags "-X=main.version=v1.8.2-34-gc895e8f6-dirty" cmd/main.go
build command-line-arguments: cannot load io/fs: malformed module path "io/fs": missing dot in first path element
make: *** [Makefile:58: linux] Error 1

joepa37 avatar Nov 06 '22 05:11 joepa37

I tried it out and i could build it and push the image without any problems. :) link to image

@joepa37 see if that works :)

jonizen avatar Nov 12 '22 21:11 jonizen

@jonizen I can see your image is linux/amd64 OS/ARCH. So maybe these issues are related to go dependencies on linux/arm64 arch only.

joepa37 avatar Nov 12 '22 23:11 joepa37

@joepa37 The things i have experienced when dealing with arm64 compiles is that you usually don't get it to work on WIN, i did a project with compiling percona xtrabackup for arm64 and the only way to get it to work, without a lot of tweaks, was to use wsl2 and use Ubuntu to run on my windows to build it in arm64 with buidkit. This image is built with my laptop running Ubuntu. So the code works, but i guess maybe you are on a win computer doing buildkit?

jonizen avatar Nov 13 '22 09:11 jonizen

@joepa37 I had a similar problem building on my M1 and similarly ran into the situation where @jonizen's image was still the AMD arch.

My fix was to make 2 small changes to the code from @mmoscher (thank you!) to do a docker buildx command and build both arches.

  1. My new make target code was:

    docker: ${DOCKERDIR}/${DOCKERFILE} docker-context
        echo `(env)`
        echo "Tag ${TAG}"
        echo "Version ${VERSION}"
        echo "CDP tag ${CDP_TAG}"
        echo "git describe $(shell git describe --tags --always --dirty)"
        if ! docker buildx ls | grep -q "zalando-builder"; then \
    	      docker buildx create --name zalando-builder; \
        fi;
        cd "${DOCKERDIR}" && docker buildx build \
    	      --rm \
    	      --builder zalando-builder \
    	      --platform linux/arm64,linux/amd64 \
    	      --tag $(IMAGE):$(TAG)$(CDP_TAG)$(DEBUG_FRESH)$(DEBUG_POSTFIX) \
    	      --push \
    	      --file ${DOCKERFILE} \
    	      .
    
  2. I removed the hardcoding of the two ARGs in the dockerfile on lines 5 and 6 to be passed in.

These changes allowed me to run the following command:

IMAGE=my-repo/zalan-do-acid-postgres-operator make docker

but still got an error 😒

cd "docker" && docker buildx build \
	--rm \
	--builder zalando-builder \
	--platform linux/arm64,linux/amd64 \
	--tag syntasso/zalan-do-acid-postgres-operator:2880a58-dirty \
	--push \
	--file Dockerfile \
	.
[+] Building 16.3s (23/33)                                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                                     0.1s
 => => transferring dockerfile: 993B                                                                                                                     0.0s
 => [internal] load .dockerignore                                                                                                                        0.1s
 => => transferring context: 2B                                                                                                                          0.0s
 => [linux/arm64 internal] load metadata for docker.io/library/alpine:3.15                                                                               4.3s
 => [linux/amd64 internal] load metadata for docker.io/library/golang:1.17-alpine3.15                                                                    4.3s
 => [linux/amd64 internal] load metadata for docker.io/library/alpine:3.15                                                                               4.3s
 => [auth] library/alpine:pull token for registry-1.docker.io                                                                                            0.0s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                            0.0s
 => [linux/amd64 go-builder 1/8] FROM docker.io/library/golang:1.17-alpine3.15@sha256:543b0922baa147b87a568968462a9586e94b588426f51396a2666590cfba327a   0.2s
 => => resolve docker.io/library/golang:1.17-alpine3.15@sha256:543b0922baa147b87a568968462a9586e94b588426f51396a2666590cfba327a                          0.1s
 => [linux/amd64 postgres-operator 1/6] FROM docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479       0.2s
 => => resolve docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479                                     0.2s
 => [internal] load build context                                                                                                                       10.6s
 => => transferring context: 61.29MB                                                                                                                    10.4s
 => [linux/arm64 postgres-operator 1/6] FROM docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479       0.2s
 => => resolve docker.io/library/alpine:3.15@sha256:cf34c62ee8eb3fe8aa24c1fab45d7e9d12768d945c3f5a6fd6a63d901e898479                                     0.2s
 => CACHED [linux/arm64 postgres-operator 2/6] RUN apk --no-cache add curl                                                                               0.0s
 => CACHED [linux/arm64 postgres-operator 3/6] RUN apk --no-cache add ca-certificates                                                                    0.0s
 => CACHED [linux/amd64 postgres-operator 2/6] RUN apk --no-cache add curl                                                                               0.0s
 => CACHED [linux/amd64 postgres-operator 3/6] RUN apk --no-cache add ca-certificates                                                                    0.0s
 => CACHED [linux/amd64 go-builder 2/8] WORKDIR /src                                                                                                     0.0s
 => CACHED [linux/amd64 go-builder 3/8] COPY . .                                                                                                         0.0s
 => CACHED [linux/amd64->arm64 go-builder 4/8] RUN go get -d k8s.io/[email protected]                                                          0.0s
 => CACHED [linux/amd64->arm64 go-builder 5/8] RUN go install github.com/golang/mock/[email protected]                                                      0.0s
 => ERROR [linux/amd64->arm64 go-builder 6/8] RUN go mod tidy                                                                                            1.1s
 => CACHED [linux/amd64 go-builder 4/8] RUN go get -d k8s.io/[email protected]                                                                 0.0s
 => CACHED [linux/amd64 go-builder 5/8] RUN go install github.com/golang/mock/[email protected]                                                             0.0s
 => ERROR [linux/amd64 go-builder 6/8] RUN go mod tidy                                                                                                   1.1s
------                                                                                                                                                        
 > [linux/amd64->arm64 go-builder 6/8] RUN go mod tidy:                                                                                                       
#0 0.850 go: go.mod file not found in current directory or any parent directory; see 'go help modules'
------
------
 > [linux/amd64 go-builder 6/8] RUN go mod tidy:
#0 0.897 go: go.mod file not found in current directory or any parent directory; see 'go help modules'
------
Dockerfile:15
--------------------
  13 |     RUN go get -d k8s.io/[email protected]
  14 |     RUN go install github.com/golang/mock/[email protected]
  15 | >>> RUN go mod tidy
  16 |     RUN go mod vendor
  17 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c go mod tidy" did not complete successfully: exit code: 1
make: *** [Makefile:74: docker] Error 1

This one feels more like something other than ARCH (edit: this is failing the same way on the branch with the docker changes but not on master branch when building on a M1 mac), but I may be missing how my changes impacted it. I will keep poking, but if anyone has an idea please let me know! Thanks πŸ™‡

abangser avatar Nov 17 '22 13:11 abangser

Yeah, I pushed the wrong image, but I got the arm64 one. So it works I believe 😊

Be aware that your output says β€œcached” if you cache a step that fails, you can have it all correct. Since this image builds fast, run --no-cache to rule that out.

I will have a look at this probably today after work 😊

jonizen avatar Nov 17 '22 15:11 jonizen

Thanks for the update @jonizen! πŸ™‡

That is interesting that it did work for you. I ended up tracking down an issue where the Dockerfile has a COPY . . and that is paired with a line in the makefile that runs the docker command from the DOCKERDIR. This means that the only files available in the image are the files in the DOCKERDIR which is obviously not enough and doesn't include the go.mod.

I fixed this by using the following docker make target:

docker: ${DOCKERDIR}/${DOCKERFILE} docker-context
	echo `(env)`
	echo "Tag ${TAG}"
	echo "Version ${VERSION}"
	echo "CDP tag ${CDP_TAG}"
	echo "git describe $(shell git describe --tags --always --dirty)"
	if ! docker buildx ls | grep -q "zalando-builder"; then \
		docker buildx create --name zalando-builder; \
	fi;
	docker buildx build \
		--rm \
		--builder zalando-builder \
		--platform linux/arm64,linux/amd64 \
		--tag $(IMAGE):$(TAG)$(CDP_TAG)$(DEBUG_FRESH)$(DEBUG_POSTFIX) \
		--push \
		--file "${DOCKERDIR}/${DOCKERFILE}" \
		.

Which has resulted in this image (no guarantee of longevity of, or updates to, this image as we are currently only using it for a demo!).

While this worked for me, I have to say I am intrigued how you ended up getting yours building as I might be doing something too heavy handed. The image did take something like 20 minutes to build!

abangser avatar Nov 17 '22 16:11 abangser

@abangser glad you found the solution yourself. As you mentioned, you've used the wrong docker-context to build the image.

The script I'm using to build a multiarch images is the following (and is located in another directory):

cd "/tmp"
echo "[INFO] Building postgresql operator ..."
git clone [email protected]:mmoscher/postgres-operator.git && pushd "postgres-operator"
git checkout arm64

docker buildx build \
		--push \
		--platform=linux/amd64,linux/arm64 \
		-t <private-repo-and-image-tag> \
		-f docker/Dockerfile \
		.
popd
rm -rf postgres-operator

However, I'm not yet using the makefile. @abangser would be awesome if you could file a PR with your change to my fork (https://github.com/mmoscher/postgres-operator/tree/arm64). Then we can work on from there and file a PR to this repo soon.

FYI: running this script on my Mac M1, using colima as docker backend, takes roughly 5m for the multiarch images to build (base images cached). However, 20m could be fine to (based on your hardware).

mmoscher avatar Nov 17 '22 17:11 mmoscher

@abangser Is there anything I can do to help progress this item? I'm very keen to have a version of the operator that works on ARM64.

mprimeaux avatar May 20 '23 18:05 mprimeaux

Absolutely appreciate it would be helpful. As I mentioned in this PR, it seems to be working for me and is as far as I can/will take a commit at this time as I am not aware of where else to go. Please feel free to merge or of course rewrite if it isn't quite right.

https://github.com/mmoscher/postgres-operator/pull/2#issuecomment-1544079462

Thanks

abangser avatar May 20 '23 18:05 abangser

Thanks! I will test this out today and reply on the PR and here.

mprimeaux avatar May 20 '23 18:05 mprimeaux

I think the latest release already included this changes, but you have to specify the correct image.

Look at the latest release on the release page. I also think the other parts for pooling and backup is planed 😊

jonizen avatar May 20 '23 20:05 jonizen

Quoted from release page:

We are excited to announce a new release of the Postgres Operator. A rather small one but bringing you ARM support for the operator (pooler, ui and logical backup will follow). Thanks to everyone who contributed with PRs, feedback, raising issues or providing ideas.

New features

Provide Postgres-Operator as multi-arch image that can run on arm (#2268, #2127) .....

jonizen avatar May 20 '23 20:05 jonizen

@jonizen @abangser I can confirm the Postgres Operator successfully starts on ARM64 (Apple M1 and M2 CPUs) in Minikube with the QEMU driver. I ended up using the following stanza in the values file:

image:
  registry: ghcr.io
  repository: zalando/postgres-operator
  tag: v1.10.0
  pullPolicy: "IfNotPresent"

Is the intent to stick with GHCR for your container registry moving forward rather than defaulting to registry.opensource.zalan.do?

mprimeaux avatar May 23 '23 21:05 mprimeaux

Yes, we want to stick with ghcr for now. Do you see any problems with this @mprimeaux ? Just curious. This issue can be closed then, right?

FxKu avatar May 25 '23 09:05 FxKu

@FxKu No issues on my end at all. The reason I asked about GHCR was only because the values.yaml still refers to the previous registry, which I assume is for compatibility reasons (e.g. not all container images and versions are in GHCR yet).

I’ll close this issue. Thanks for all your help and support.

mprimeaux avatar May 25 '23 12:05 mprimeaux

Can we get the UI - postgres-operator-ui image for ARM64?

urashidmalik avatar Sep 18 '23 00:09 urashidmalik