pulumi-kubernetes icon indicating copy to clipboard operation
pulumi-kubernetes copied to clipboard

hang with 100% cpu during preview of ConfigFile resources

Open cbcmg opened this issue 3 years ago • 13 comments

Hello!

  • Vote on this issue by adding a 👍 reaction
  • To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already)

Issue details

Trying to pulumi up my prod environment after not touching it for a while, and it is hanging (100% cpu on python process) in preview. I believe is it related to the large (alb and cert-manager) k8s.yaml.ConfigFile resources I have. Everything has been fine for many months, but since the last time I touched it, pulumi, python, pulumi-kubernetes, and my laptop (new m1) have gone through many updates (which I have just applied). I've tried logging as suggested on the troubleshooting page, but can't see anything interesting. pulumi refresh seems to work ok. aws cli and kubectl are connecting. If I comment out the ConfigFile resource, then preview completes normally (and offers to delete my resources).

$ pulumi version
v3.22.0

$ pip freeze
Arpeggio==1.10.2
attrs==21.4.0
certifi==2021.10.8
charset-normalizer==2.0.10
dill==0.3.4
grpcio==1.43.0
idna==3.3
parver==0.3.1
protobuf==3.19.3
pulumi==3.22.0
pulumi-aws==4.34.0
pulumi-eks==0.36.0
pulumi-kubernetes==3.14.0
PyYAML==6.0
requests==2.27.1
semver==2.13.0
six==1.16.0
urllib3==1.26.8```

cbcmg avatar Jan 14 '22 13:01 cbcmg

I saw #1731 but there was no solution for me there. I tried uninstalling awscli but got this error message: Error: Could not find aws CLI for EKS.

Also, I'm on a MacBook 16 M1 Max.

$ aws --version
aws-cli/2.4.11 Python/3.9.9 Darwin/21.2.0 source/arm64 prompt/off

cbcmg avatar Jan 14 '22 13:01 cbcmg

I spun up a new debian 11 vm, installed aws v1, kubectl, pulumi, created a new python venv, installed the above list of python packages, copied the source from my machine, logged into pulumi cloud, ran pulumi up, and it works as expected.

cbcmg avatar Jan 15 '22 02:01 cbcmg

results of: pulumi up --stack xxxx/yyyy.devx --logflow --logtostderr -v=9 2> out.txt out.txt

This may be enough to reproduce:

  cert_manager_crds = k8s.yaml.ConfigFile(
    'cert-manager',
    # opts=ResourceOptions(provider=k8s_provider),
    file='manifests/cert-manager-v1.8.0.crds.yaml',  # from: https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.crds.yaml
  )

This hang occurs just trying to preview. If I comment out the above code in my project, everything runs fine.

Copy/paste from slack:

Sorry, I'm back again. This issue is still not resolved for me. I have updated pulumi and libraries to current releases, but python still hangs. I managed to attach a python debugger to the process and it seems to get stuck forever in grpc/protobuf code. I tried stepping through but the stack was 50 deep and just low level serialization code. This is the yaml I'm trying to apply: https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.crds.yaml If I comment out half of it, it seems fine. If I put some back in, it hangs. But it doesn't seem to matter which bit I comment out, more so the amount. To recap: this used to work on my Intel mac. It works now on a debian arm64 VM running on my m1 mac. It hangs on the mac using python 3.9 arm64 build.

cbcmg avatar Apr 21 '22 10:04 cbcmg

I'm running into the same problem, but with KEDA. Am also on a Mac using python 3.9 arm64 build.

# # Install KEDA
keda = k8s.yaml.ConfigFile(
    "keda",
    file="https://github.com/kedacore/keda/releases/download/v2.2.0/keda-2.2.0.yaml",
    transformations=[remove_status],
)

out.txt

cmarteepants avatar Apr 21 '22 23:04 cmarteepants

I've narrowed it down to grpc/protobuf...

I have attached a script to reproduce with only the pulumi python package as a dependency. test.py.txt

The summary is that it takes 15s to serialize a small in-memory structure to a byte buffer. On Linux it takes 1ms.

No idea why. It's really tough to debug deep recursive structures in protobuf code. I suppose this issue should really be reported to them, but would appreciate some guidance here first.

Thanks.

cbcmg avatar Apr 22 '22 06:04 cbcmg

I see https://github.com/protocolbuffers/protobuf/issues/9839 has already been raised. Our internal testing suggests that this is just a protobuf issue and there's nothing pulumi specific about it. We'll watch and assist that issue although we're limited in engineers who have access to M1s to develop with.

Frassle avatar Apr 22 '22 15:04 Frassle

Confirming on my m1 for more data points:

parse=0.0079
serialize_1=14.2372
serialize_2=2.4117

Running on a linux VM on the same machine (Linux fedora 5.11.12-300.fc34.aarch64 #1 SMP Wed Apr 7 16:12:21 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux):

parse=0.0001
serialize_1=0.0003
serialize_2=0.0004

stevesloka avatar Apr 22 '22 15:04 stevesloka

So it was pretty simple in the end. The current protobuf python package does not build the extension for M1, and the pure python implementation either doesn't work at all, or works too slowly with large chunks of yaml.

see: https://github.com/protocolbuffers/protobuf/issues/9839

Forcing the build of the local extension in the venv of my pulumi project resolves the issue.

A recipe for users with brew:

$ cd my-project
$ source venv/bin/activate
$ export CFLAGS="-I$(brew --prefix protobuf)/include"; export LDFLAGS="-L$(brew --prefix protobuf)/lib"
$ pip install --force-reinstall protobuf=="$(brew list --version protobuf | awk '{print $2}')" --install-option="--cpp_implementation"

cbcmg avatar Apr 23 '22 03:04 cbcmg

w list --version protobuf | awk '{print $2}')" --install-option="--cpp_implementation"

This didn't work for me. I am pretty novice to Macs in general but I got an error

pip install --force-reinstall protobuf=="$(brew list --version protobuf | awk '{print $2}')" --install-option="--cpp_implementation"
WARNING: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
ERROR: Could not find a version that satisfies the requirement protobuf== (from versions: 2.0.0b0, 2.0.3, 2.3.0, 2.4.1, 2.5.0, 2.6.0, 2.6.1, 3.0.0a2, 3.0.0a3, 3.0.0b1.post2, 3.0.0b2, 3.0.0b2.post1, 3.0.0b2.post2, 3.0.0b3, 3.0.0b4, 3.0.0, 3.1.0.post1, 3.2.0rc1, 3.2.0rc1.post1, 3.2.0rc2, 3.2.0, 3.3.0, 3.4.0, 3.5.0.post1, 3.5.1, 3.5.2, 3.5.2.post1, 3.6.0, 3.6.1, 3.7.0rc2, 3.7.0rc3, 3.7.0, 3.7.1, 3.8.0rc1, 3.8.0, 3.9.0rc1, 3.9.0, 3.9.1, 3.9.2, 3.10.0rc1, 3.10.0, 3.11.0rc1, 3.11.0rc2, 3.11.0, 3.11.1, 3.11.2, 3.11.3, 3.12.2, 3.12.4, 3.13.0rc3, 3.13.0, 3.14.0rc1, 3.14.0rc2, 3.14.0rc3, 3.14.0, 3.15.0rc1, 3.15.0rc2, 3.15.0, 3.15.1, 3.15.2, 3.15.3, 3.15.4, 3.15.5, 3.15.6, 3.15.7, 3.15.8, 3.16.0rc1, 3.16.0rc2, 3.16.0, 3.17.0rc1, 3.17.0rc2, 3.17.0, 3.17.1, 3.17.2, 3.17.3, 3.18.0rc1, 3.18.0rc2, 3.18.0, 3.18.1, 3.19.0rc1, 3.19.0rc2, 3.19.0, 3.19.1, 3.19.2, 3.19.3, 3.19.4, 3.20.0rc1, 3.20.0rc2, 3.20.0, 3.20.1rc1, 3.20.1, 4.0.0rc1, 4.0.0rc2)
ERROR: No matching distribution found for protobuf==

I tried just stripping it down to

pip install --force-reinstall protobuf==3.20.1 --install-option="--cpp_implementation" and it uninstalls okay and then

In file included from google/protobuf/pyext/descriptor.cc:33:
      ./google/protobuf/pyext/descriptor.h:39:10: fatal error: 'google/protobuf/descriptor.h' file not found
      #include <google/protobuf/descriptor.h>
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  WARNING: No metadata found in ./venv/lib/python3.9/site-packages
  Rolling back uninstall of protobuf
  Moving to /Users/gabrielmccoll/quickstart/venv/lib/python3.9/site-packages/google/protobuf/
   from /Users/gabrielmccoll/quickstart/venv/lib/python3.9/site-packages/google/~rotobuf
  Moving to /Users/gabrielmccoll/quickstart/venv/lib/python3.9/site-packages/protobuf-3.20.1-nspkg.pth
   from /private/var/folders/84/9sk86cr1095dwf1_yn3fxlj00000gn/T/pip-uninstall-ew65e1cj/protobuf-3.20.1-nspkg.pth
  Moving to /Users/gabrielmccoll/quickstart/venv/lib/python3.9/site-packages/protobuf-3.20.1.dist-info/
   from /Users/gabrielmccoll/quickstart/venv/lib/python3.9/site-packages/~rotobuf-3.20.1.dist-info
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> protobuf

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

any hints appreciated @cbcmg

DevOpsBoondoggles avatar May 02 '22 12:05 DevOpsBoondoggles

@gabrielmccoll It looks like you don't have protobuf installed with brew. Do you have brew installed?

cbcmg avatar May 02 '22 12:05 cbcmg

@gabrielmccoll It looks like you don't have protobuf installed with brew. Do you have brew installed?

Thank you for the fast reply. I seemed to get it when I installed Pulumi via pip?

I tried just installing brew install protobuf but it brought version 3.19 and Pulumi seemed to think the sdk wasn't installed anymore.

Apologies as I'm probably just being very novice

DevOpsBoondoggles avatar May 02 '22 12:05 DevOpsBoondoggles

Yes, installing pulumi will install protobuf-3.20.1. Brew does not yet have that version, but 3.19.4 seems to work fine with pulumi.

Pulumi seemed to think the sdk wasn't installed anymore

Not sure what happened there. If you have a clean pulumi project, brew installed, and protobuf installed with brew, and the script above completes without errors, it should work.

cbcmg avatar May 02 '22 13:05 cbcmg

Yes, installing pulumi will install protobuf-3.20.1. Brew does not yet have that version, but 3.19.4 seems to work fine with pulumi.

Pulumi seemed to think the sdk wasn't installed anymore

Not sure what happened there. If you have a clean pulumi project, brew installed, and protobuf installed with brew, and the script above completes without errors, it should work.

Okay got it thank you. I ran pip uninstall protobuf. then brew install protobuf then ran your script above and it seems to work now.

thanks a lot!

DevOpsBoondoggles avatar May 02 '22 13:05 DevOpsBoondoggles