autoscaling 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 Insufficient neonvm/kvm.

0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 Insufficient neonvm/kvm.

Open william-lbn opened this issue 4 months ago • 4 comments

I followed https://github.com/neondatabase/autoscaling?tab=readme-ov-file#building-and-running

postgres16-disk-test always pending

root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# kubectl get pod 
NAME                         READY   STATUS    RESTARTS   AGE
postgres16-disk-test-b88qv   0/2     Pending   0          6m1s

root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# kubectl describe pod postgres16-disk-test-b88qv
Name:             postgres16-disk-test-b88qv
...
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From                 Message
  ----     ------            ----   ----                 -------
  Warning  FailedScheduling  6m10s  autoscale-scheduler  0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 Insufficient neonvm/kvm.
  Warning  FailedScheduling  45s    autoscale-scheduler  0/3 nodes are available: 1 node(s) had untolerated taint {node

autoscale-scheduler logs as follows

{"level":"error","ts":1713319685.30246,"logger":"autoscale-scheduler.plugin","caller":"plugin/plugin.go:343","msg":"Pod rejected by all Filter method calls","method":"Filter","virtualmachine":{"namespace":"default","name":"postgres16-disk-test"},"pod":{"namespace":"default","name":"postgres16-disk-test-b88qv"},"stacktrace":"github.com/neondatabase/autoscaling/pkg/plugin.(*AutoscaleEnforcer).
PostFilter\n\t/workspace/pkg/plugin/plugin.go:343\nk8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runPostFilterPlugin\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/scheduler/framework/runtime/framework.go:776\nk8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunPostFilterPlugins\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/scheduler/framework/runtime/framework.go:759\nk8s.io/kubernetes/pkg/scheduler.
(*Scheduler).scheduleOne\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/scheduler/schedule_one.go:110\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:190\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:190\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:101"}

root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# more vm-deploy.yaml 
---
apiVersion: vm.neon.tech/v1
kind: VirtualMachine
metadata:
  name: postgres16-disk-test
  annotations:
    # In this example, these bounds aren't necessary. So... here's what they look like :)
    autoscaling.neon.tech/bounds: '{ "min": { "cpu": 0.25, "mem": "1Gi" }, "max": { "cpu": 1.25, "mem": "1Gi" } }'
  labels:
    autoscaling.neon.tech/enabled: "true"
    # Set to "true" to continuously migrate the VM (TESTING ONLY)
    autoscaling.neon.tech/testing-only-always-migrate: "false"
spec:
  schedulerName: autoscale-scheduler
  enableSSH: true
  guest:
    cpus: { min: 0.25, use: 0.25, max: 0.25 }
    memorySlotSize: 1Gi
    memorySlots: { min: 1, use: 1, max: 1 }
    rootDisk:
      image: pg16-disk-test:dev
      size: 1Gi
    ports:
      - port: 5432 # postgres
      - port: 9100 # metrics
      - port: 10301 # monitor

root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# kubectl get neonvm
NAME                   CPUS   MEMORY   POD                          EXTRAIP   STATUS    RESTARTS   AGE
postgres16-disk-test                   postgres16-disk-test-b88qv             Pending              5h10m


root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# kubectl get pod postgres16-disk-test-b88qv -ojson -ojson
...
    "status": {
        "conditions": [
            {
                "lastProbeTime": null,
                "lastTransitionTime": "2024-04-17T02:08:05Z",
                "message": "0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 Insufficient neonvm/kvm.",
                "reason": "Unschedulable",
                "status": "False",
                "type": "PodScheduled"
            }
        ],
        "phase": "Pending",
        "qosClass": "Burstable"
    }
}

root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# kubectl get node
NAME                        STATUS   ROLES           AGE   VERSION
neonvm-root-control-plane   Ready    control-plane   23h   v1.25.11
neonvm-root-worker          Ready    <none>          23h   v1.25.11
neonvm-root-worker2         Ready    <none>          23h   v1.25.11

root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# kubectl describe node neonvm-root-worker
Name:               neonvm-root-worker
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                5560m (34%)   18360m (114%)
  memory             3118Mi (10%)  6030Mi (19%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
  neonvm/kvm         0             0
  neonvm/vhost-net   0             0
Events:              <none>

How can I make postgres16-disk-test scheduling successful?
I'm a beginner, thank you very much for your help.

In addition, which service of neon can postgres16-disk-test be compared to? Is there a way to simulate scaling testing?

Running pgbench

root@iZbp19lce9chqq1glegm26Z:~/serverless/neon/autoscaling# scripts/run-bench.sh
If you don't see a command prompt, try pressing enter.
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
(1/8) Installing postgresql-common (1.2-r1)
Executing postgresql-common-1.2-r1.pre-install
(2/8) Installing lz4-libs (1.9.4-r5)
(3/8) Installing libpq (16.2-r0)
(4/8) Installing ncurses-terminfo-base (6.4_p20231125-r0)
(5/8) Installing libncursesw (6.4_p20231125-r0)
(6/8) Installing readline (8.2.1-r2)
(7/8) Installing zstd-libs (1.5.5-r8)
(8/8) Installing postgresql16-client (16.2-r0)
Executing busybox-1.36.1-r15.trigger
Executing postgresql-common-1.2-r1.trigger
* Setting postgresql16 as the default version
OK: 12 MiB in 23 packages
Running pgbench. Query:
   select length(factorial(length(factorial(1223)::text)/2)::text);
pgbench: error: too many command-line arguments (first is "postgres")
pgbench: hint: Try "pgbench --help" for more information.
pod "pgbench-postgres16-disk-test" deleted
pod default/pgbench-postgres16-disk-test terminated (Error)


root@iZbp19lce9chqq1glegm26Z:~# kubectl get pod -w
NAME                         READY   STATUS    RESTARTS   AGE
postgres16-disk-test-b88qv   0/2     Pending   0          14m


pgbench-postgres16-disk-test   0/1     Pending   0          0s
pgbench-postgres16-disk-test   0/1     Pending   0          0s
pgbench-postgres16-disk-test   0/1     ContainerCreating   0          0s
pgbench-postgres16-disk-test   0/1     ContainerCreating   0          1s
pgbench-postgres16-disk-test   1/1     Running             0          16s
pgbench-postgres16-disk-test   0/1     Error               0          18s
pgbench-postgres16-disk-test   0/1     Error               0          20s
pgbench-postgres16-disk-test   0/1     Terminating         0          20s
pgbench-postgres16-disk-test   0/1     Terminating         0          20s

During Running pgbench, there were no neonvm resources, and an error occurred and exited.

root@iZbp19lce9chqq1glegm26Z:~# kubectl get neonvm
NAME                   CPUS   MEMORY   POD                          EXTRAIP   STATUS    RESTARTS   AGE
postgres16-disk-test                   postgres16-disk-test-b88qv             Pending              17m

Apr 17 '24 02:04 william-lbn

autoscaling autoscaling copied to clipboard

0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 Insufficient neonvm/kvm.

autoscaling
autoscaling copied to clipboard