pixie icon indicating copy to clipboard operation
pixie copied to clipboard

Can't install pixie to completely air gapped environment

Open blencoff opened this issue 3 years ago • 10 comments

Describe the bug Can't install pixie to completely air gapped environment.

To Reproduce Currently I'm trying to install it via YAML scheme. I've already pushed all images mentioned in manifests generated on extract manifests step to my local artifactory and replaced original images links with local ones, but during installation pixie still tries to download some images (e.g. busybox:1.28.0-glibc and nats:1.3.0) from the internet.

Expected behavior Be able to install pixie to self-hosted k8s cluster with no access to the internet.

Logs Please attach the logs by running the following command:

[root@localhost pixie_yamls]# kubectl get pods -n pl
NAME                                      READY   STATUS                       RESTARTS   AGE
etcd-operator-6c6f8cb48d-q5t8q            1/1     Running                      0          43m
kelvin-6c67584687-pwlrg                   0/1     Init:0/1                     0          42m
nats-operator-7bbff5c756-tt2rl            1/1     Running                      0          43m
pl-etcd-zs25zbm5ln                        0/1     Init:ImagePullBackOff        0          41m
pl-nats-1                                 0/1     ImagePullBackOff             0          42m
vizier-certmgr-58d97fd6b5-8wp9n           0/1     CreateContainerConfigError   0          42m
vizier-cloud-connector-74c5c84487-m4bmq   1/1     Running                      1          42m
vizier-metadata-6bc96dd78-g9brg           0/1     Init:0/2                     0          42m
vizier-pem-bv858                          0/1     Init:0/1                     0          42m
vizier-pem-dktqv                          0/1     Init:0/1                     0          42m
vizier-pem-ftd66                          0/1     Init:0/1                     0          42m
vizier-pem-gmrfq                          0/1     Init:0/1                     0          42m
vizier-pem-j7xmx                          0/1     Init:0/1                     0          42m
vizier-pem-jxl7j                          0/1     Init:0/1                     0          42m
vizier-pem-kcfbf                          0/1     Init:0/1                     0          42m
vizier-pem-mgzgj                          0/1     Init:0/1                     0          42m
vizier-pem-v7k7q                          0/1     Init:0/1                     0          42m
vizier-proxy-8568c9bd48-fdccm             0/1     CreateContainerConfigError   0          42m
vizier-query-broker-7b74f9cbdc-265m4      0/1     Init:0/1                     0          42m

[root@localhost pixie_yamls]# kc describe pod pl-etcd-zs25zbm5ln -n pl
Name:         pl-etcd-zs25zbm5ln
Namespace:    pl
...
Events:
  Type     Reason     Age                  From                             Message
  ----     ------     ----                 ----                             -------
  Normal   Scheduled  56m                  default-scheduler                Successfully assigned pl/pl-etcd-zs25zbm5ln to xxx
  Warning  Failed     55m                  kubelet, xxx  Failed to pull image "busybox:1.28.0-glibc": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: read tcp 192.168.0.33:34516->23.23.116.141:443: read: connection reset by peer
  Warning  Failed     55m                  kubelet, xxx  Failed to pull image "busybox:1.28.0-glibc": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: read tcp 192.168.0.33:59176->54.224.119.26:443: read: connection reset by peer
  Warning  Failed     55m                  kubelet, xxx  Failed to pull image "busybox:1.28.0-glibc": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: read tcp 192.168.0.33:42888->107.23.149.57:443: read: connection reset by peer
  Warning  Failed     54m (x4 over 55m)    kubelet, xxx  Error: ErrImagePull
  Normal   Pulling    54m (x4 over 55m)    kubelet, xxx  Pulling image "busybox:1.28.0-glibc"
  Warning  Failed     54m                  kubelet, xxx  Failed to pull image "busybox:1.28.0-glibc": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: read tcp 192.168.0.33:41714->34.238.187.50:443: read: connection reset by peer
  Normal   BackOff    45m (x43 over 55m)   kubelet, xxx  Back-off pulling image "busybox:1.28.0-glibc"
  Warning  Failed     48s (x234 over 55m)  kubelet, xxx  Error: ImagePullBackOff


[root@localhost pixie_yamls]# kc describe pod pl-nats-1 -n pl
Name:         pl-nats-1
Namespace:    pl
...
Events:
  Type     Reason       Age                    From                             Message
  ----     ------       ----                   ----                             -------
  Normal   Scheduled    57m                    default-scheduler                Successfully assigned pl/pl-nats-1 to yyy
  Warning  FailedMount  57m (x6 over 57m)      kubelet, yyy  MountVolume.SetUp failed for volume "server-tls-certs" : secret "service-tls-certs" not found
  Warning  Failed       56m                    kubelet, yyy  Failed to pull image "nats:1.3.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: read tcp 192.168.0.18:32860->3.220.36.210:443: read: connection reset by peer
  Warning  Failed       56m                    kubelet, yyy  Failed to pull image "nats:1.3.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: read tcp 192.168.0.18:52026->107.23.149.57:443: read: connection reset by peer
  Warning  Failed       2m26s (x227 over 56m)  kubelet, yyy Error: ImagePullBackOff

App information (please complete the following information):

  • Pixie version: Pixie CLI 0.5.8+Distribution.a09aa96.20210506210658.1
  • K8s cluster version: Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.1", GitCommit:"206bcadf021e76c27513500ca24182692aabd17e", GitTreeState:"clean", BuildDate:"2020-09-09T11:26:42Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

blencoff avatar May 12 '21 16:05 blencoff

At the moment, I can say with the http_proxy config, we can successfully deploy Pixie in a private network behind a proxy. The problem is we not able to register a Kubernetes cluster with our local Pixie Cloud due to some services are still downloading manifests from the internet. We will need a lot of instructions from the development team to make things work.

dontmint avatar Apr 15 '22 09:04 dontmint

The latest Pixie Cloud release includes fixes that make it possible to install Pixie in an air gap environment. Please let us know if you run into any issues following the instructions below. Note that we hope to make this process much easier in subsequent releases!

Installing Pixie in an Air Gap Environment

To install Pixie in an air gap environment, follow the self-hosted install guide with the modifications listed below.

1. Deploy Pixie Cloud

Before deploying Pixie cloud + dependencies in Step 7 and Step 8, you'll need to collect and publish the required images to a private image registry. To list the images needed to deploy Pixie Cloud, run these commands:

curl https://storage.googleapis.com/pixie-dev-public/cloud/latest/pixie_cloud.tar.gz | tar xj
cd pixie_cloud
cat cloud_image_list.txt

Modify the yaml files in the pixie_cloud/yamls folder to pull the images from your private image registry.

Modify the pixie_cloud/yamls/cloud.yaml file to remove the plugin-db-updater-job job.

Instead of using the kustomize build commands in Step 7 and Step 8, use the modified yaml files:

# Step 7
kubectl apply -f yamls/cloud_deps_elastic_operator.yaml
kubectl apply -f yamls/cloud_deps.yaml

# Step 8

kubectl apply -f yamls/cloud.yaml

Note about Step 9: if you apply the px_cloud manifest multiple times (for example, while resolving an ImagePullError), you will face the following create-admin-job and create-hydra-client-job errors:

create-admin-job time="2022-07-05T21:27:40Z" level=fatal msg="Org 'default' with domain 'default.com' already exists. Remove the org from the database or change the org name."
create-hydra-client-job {                                                                                                                                                                       
create-hydra-client-job   "error": "Unable to insert or update resource because a resource with that value exists already"                                                                    
create-hydra-client-job }

If you get these errors, delete the plc namespace and start over from Step 4 of the install guide.

Authentication using Kratos / Hydra

After completing Step 3 of this section, you will see a blank UI with the following errors in the DevTools console: image

To resolve these errors, you'll need to setup the script dev environment:

git clone https://github.com/pixie-io/pixie.git
cd pixie/src/pxl_scripts
make dev

Open Chrome’s DevTools console and run the following:

localStorage.setItem('px-custom-oss-bundle-path', 'http://127.0.0.1:8000/bundle-oss.json')
localStorage.setItem('px-custom-core-bundle-path', 'http://127.0.0.1:8000/bundle-oss.json')

Once you have set these variables, do a soft reload of the UI webpage (a hard reload will clear the variable you just set).

Once you're able to see Pixie's UI, you'll need to use Pixie's UI to create a deploy key. Record this value somewhere, we'll use it in a future step.

2. Install the Pixie CLI

Skip this section of the install guide.

3. Deploy Pixie

This section of the install guide uses Pixie's CLI to deploy Pixie's Vizier component. Skip this step and follow the steps below:

  1. Download the vizier artifacts:
curl https://storage.googleapis.com/pixie-dev-public/vizier/latest/vizier_yamls.tar | tar x
cd yamls
  1. Update the deploy-key and PX_CLUSTER_NAME values in the vizier/secrets.yaml file. Remember that you previously created the deploy key in Pixie UI's.

  2. Deploy the vizier/secrets.yaml file.

kubectl apply -f vizier/secrets.yaml
  1. Determine whether you'd like to deploy Pixie with or without etcd. We recommend installing Pixie without etcd as long as your cluster supports Pixie creating and using PVs.

To deploy Pixie without etcd, use the following yamls:

  • vizier/vizier_metadata_persist_prod.yaml
  • vizier_deps/nats_prod.yaml

To deploy Pixie with etcd, use the following yamls:

  • vizier/vizier_etcd_metadata_prod.yaml
  • vizier_deps/etcd_prod.yaml
  1. Collect and publish the required Pixie Vizier images to a private registry. Note: the below commands assume you are deploying Pixie without etcd.

To list the images needed to deploy Pixie Cloud:

cat images/vizier_image_list.txt
  1. Apply the yamls. Note: the below commands assume you are deploying Pixie without etcd.
kubectl apply -f vizier_deps/nats_prod.yaml
kubectl apply -f vizier/vizier_metadata_persist_prod.yaml
  1. Wait for the pods in the pl namespace to become ready and available:
kubectl get pods -n pl

Ignore this Pixie UI warning

You may see a warning in the Pixie UI that is similar to the following: Screen Shot 2022-07-12 at 16 49 33 You can ignore this warning. Pixie air gap install uses the non-operator version of Pixie. The default install of Pixie recently switched to use an operator, which will allow us to add self-healing features in the future. This warning was to encourage users to upgrade to the operator version of Pixie, which is not currently available for air gap users.

htroisi avatar Jul 06 '22 21:07 htroisi

Hi @htroisi

Thank you for the update on the document, I have followed your instruction and still meet some errors bellow

At step 7 & 8 in section 1. Deploy Pixie Cloud

  • Job plugin-db-updater-job - container updater - still need internet to download from github.
level=fatal msg="Failed to fetch plugin repo" error="Get \"https://api.github.com/repos/pixie-io/pixie-plugin/tarball\": dial tcp 20.205.243.168:443: connect: network is unreachable"

Fixed: Add env http_proxy, https_proxy, no_proxy to the generated px_cloud.yaml and retry from step 4 of the install guide.

And the Postgres issue at #417

dontmint avatar Jul 12 '22 06:07 dontmint

We should change the PostgreSQL image to 14.4 too

dontmint avatar Jul 12 '22 07:07 dontmint

Hi @dontmint - thank you for testing out the air gap install instructions and reporting back!

The plugin-db-updater-job is used for Pixie's Plugin System, which is less useful for air gap systems. We'll fix the job soon, but in the meantime I've updated the instructions to include a step to remove the plugin-db-updater-job from the cloud.yaml.

htroisi avatar Jul 12 '22 20:07 htroisi

How to change the image pulling policy of px-operator from Always to IfNotPresent?

NAME READY STATUS RESTARTS AGE pixie-operator-index-j4j4w 0/1 ImagePullBackOff 0 14m pixie-operator-index-q5xm4 0/1 ImagePullBackOff 0 14m

cat 04_catalog.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: pixie-operator-index namespace: px-operator spec: sourceType: grpc image: gcr.io/pixie-oss/pixie-prod/operator/bundle_index:0.0.1 displayName: Pixie Vizier Operator publisher: px.dev updateStrategy: registryPoll: interval: 10m

tuninger avatar Jul 13 '22 03:07 tuninger

Hi @htroisi

I have successfully installed Pixie in our private cluster and It works great. I have 2 features that I would like to request because of the specialty of air gap environment.

  1. Able to change the Pixie Cloud domain from dev.withpixie.dev + work.dev.withpixie.dev to custom domains like pixie.my-organization.com + work.pixie.my-organization.com. Currently, I have to add these domains to /etc/hosts to access Pixie Cloud.
  2. Able to config grpc port and domain for the Pixie Cloud grpc endpoint. I prefer to access Pixie Cloud by using ingress because I don't have Load Balancer on-premises. Eg:

from

https://work.dev.withpixie.dev:4444/px.api.vizierpb.VizierService/ExecuteScript

to

https://grpc.pixie.my-organization.com/px.api.vizierpb.VizierService/ExecuteScript

dontmint avatar Jul 13 '22 09:07 dontmint

@dontmint - These are great suggestions! Would you mind filing 2 new issues, one for each request? I don't want these ideas to get lost in this long thread.

htroisi avatar Jul 20 '22 18:07 htroisi

I retried to install Pixie with our custom domain name and figured out how to make it work, a lot of stuff needs to be modified. I will update those steps for your document later.

dontmint avatar Aug 18 '22 10:08 dontmint

Hi @htroisi,

I will inject my modification in your comment below in order to use the custom domain for Pixie Cloud

The latest Pixie Cloud release includes fixes that make it possible to install Pixie in an air gap environment. Please let us know if you run into any issues following the instructions below. Note that we hope to make this process much easier in subsequent releases!

Installing Pixie in an Air Gap Environment

To install Pixie in an air gap environment, follow the self-hosted install guide with the modifications listed below.

Custom domain modification
* Step 5. Create the Pixie Cloud secrets.
Change your domain in the script ./scripts/create_cloud_secrets.sh
mkcert \
  -cert-file "${PROXY_CERT_FILE}" \
  -key-file "${PROXY_KEY_FILE}" \
  pixie.example.com "*.pixie.example.com" localhost 127.0.0.1 ::1

1. Deploy Pixie Cloud

Before deploying Pixie cloud + dependencies in Step 7 and Step 8, you'll need to collect and publish the required images to a private image registry. To list the images needed to deploy Pixie Cloud, run these commands:

curl https://storage.googleapis.com/pixie-dev-public/cloud/latest/pixie_cloud.tar.gz | tar xj
cd pixie_cloud
cat cloud_image_list.txt

Modify the yaml files in the pixie_cloud/yamls folder to pull the images from your private image registry.

Modify the pixie_cloud/yamls/cloud.yaml file to remove the plugin-db-updater-job job.

Instead of using the kustomize build commands in Step 7 and Step 8, use the modified yaml files:

# Step 7
kubectl apply -f yamls/cloud_deps_elastic_operator.yaml
kubectl apply -f yamls/cloud_deps.yaml


# Step 8

kubectl apply -f yamls/cloud.yaml
Custom domain modification

* Replace the default domain in file  yamls/cloud.yaml
dev.withpixie.dev
-->
example.com

* Edit the configMap: proxy-envoy-config in the yamls/cloud.yaml file
Find and change the following line

cors:
 allow_origin_string_match:
   - suffix: "dev.withpixie.dev"

To

cors:
  allow_origin_string_match:
    - prefix: "*"

Note about Step 9: if you apply the px_cloud manifest multiple times (for example, while resolving an ImagePullError), you will face the following create-admin-job and create-hydra-client-job errors:

create-admin-job time="2022-07-05T21:27:40Z" level=fatal msg="Org 'default' with domain 'default.com' already exists. Remove the org from the database or change the org name."
create-hydra-client-job {                                                                                                                                                                       
create-hydra-client-job   "error": "Unable to insert or update resource because a resource with that value exists already"                                                                    
create-hydra-client-job }

If you get these errors, delete the plc namespace and start over from Step 4 of the install guide.

Authentication using Kratos / Hydra

After completing Step 3 of this section, you will see a blank UI with the following errors in the DevTools console: image

To resolve these errors, you'll need to setup the script dev environment:

git clone https://github.com/pixie-io/pixie.git
cd pixie/src/pxl_scripts
make dev

Open Chrome’s DevTools console and run the following:

localStorage.setItem('px-custom-oss-bundle-path', 'http://127.0.0.1:8000/bundle-oss.json')
localStorage.setItem('px-custom-core-bundle-path', 'http://127.0.0.1:8000/bundle-oss.json')

Once you have set these variables, do a soft reload of the UI webpage (a hard reload will clear the variable you just set).

Once you're able to see Pixie's UI, you'll need to use Pixie's UI to create a deploy key. Record this value somewhere, we'll use it in a future step.

2. Install the Pixie CLI

Skip this section of the install guide.

3. Deploy Pixie

This section of the install guide uses Pixie's CLI to deploy Pixie's Vizier component. Skip this step and follow the steps below:

  1. Download the vizier artifacts:
curl https://storage.googleapis.com/pixie-dev-public/vizier/latest/vizier_yamls.tar | tar x
cd yamls
  1. Update the deploy-key and PX_CLUSTER_NAME values in the vizier/secrets.yaml file. Remember that you previously created the deploy key in Pixie UI's.
  2. Deploy the vizier/secrets.yaml file.
kubectl apply -f vizier/secrets.yaml
  1. Determine whether you'd like to deploy Pixie with or without etcd. We recommend installing Pixie without etcd as long as your cluster supports Pixie creating and using PVs.

To deploy Pixie without etcd, use the following yamls:

  • vizier/vizier_metadata_persist_prod.yaml
  • vizier_deps/nats_prod.yaml

To deploy Pixie with etcd, use the following yamls:

  • vizier/vizier_etcd_metadata_prod.yaml
  • vizier_deps/etcd_prod.yaml
  1. Collect and publish the required Pixie Vizier images to a private registry. Note: the below commands assume you are deploying Pixie without etcd.

To list the images needed to deploy Pixie Cloud:

cat images/vizier_image_list.txt
  1. Apply the yamls. Note: the below commands assume you are deploying Pixie without etcd.
kubectl apply -f vizier_deps/nats_prod.yaml
kubectl apply -f vizier/vizier_metadata_persist_prod.yaml
  1. Wait for the pods in the pl namespace to become ready and available:
kubectl get pods -n pl

Ignore this Pixie UI warning

You may see a warning in the Pixie UI that is similar to the following: Screen Shot 2022-07-12 at 16 49 33 You can ignore this warning. Pixie air gap install uses the non-operator version of Pixie. The default install of Pixie recently switched to use an operator, which will allow us to add self-healing features in the future. This warning was to encourage users to upgrade to the operator version of Pixie, which is not currently available for air gap users.

This is how I setup Pixie with my custom domain name, I hope this help. All you need to do is config your DNS or /etc/hosts file to access Pixie Cloud.

dontmint avatar Aug 22 '22 06:08 dontmint

This is now implemented: https://docs.px.dev/installing-pixie/install-guides

zasgar avatar Sep 22 '22 18:09 zasgar

Below is the log for the pod vizier-pem created by vizier-cloud-connector deployMent.

Why does the SSL authentication fail when the pl-nats service is connected?

I20221209 07:09:18.493793 2354094 manager.cc:156] Hostname: iZbp11uldizj3s4hzhgd73Z
Error: 29 - SSL Error - (conn.c:737): SSL handshake error: 18:self signed certificate:depth=0:cert=/O=*/CN=*:issuer=/O=*/CN=*
Stack: (library version: 3.3.0)
  01 - _makeTLSConn
  02 - _checkForSecure
  03 - _processExpectedInfo
  04 - _processConnInit
  05 - _connect
  06 - natsConnection_Connect
F20221209 07:09:19.833931 2354094 statusor.h:148] Check failed: _s.ok() Bad Status: Unknown : Failed to connect to NATS, nats_status=29
*** Check failure stack trace: ***
E20221209 07:09:19.834019 2354094 signal_action.cc:63] Caught Aborted, suspect faulting address 0x23ebae. Trace:

dragonTour avatar Dec 09 '22 09:12 dragonTour