sun3book
sun3book
安装 tuned,然后重启default-host 节点状态恢复正常 sudo apt install tuned tuned-utils tuned-utils-systemtap
计算节点状态恢复正常后,default-host 有新报错,里边的虚拟机打不开,计算节点有设置 pci 透传 gpu [error 2024-02-06 09:53:38 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:170)] no block device avaiable [error 2024-02-06 09:53:38 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:746)] exit status 1 [error 2024-02-06 09:53:38 hostinfo.(*SHostInfo).detectOsDist(hostinfo.go:758)] Failed to detect distribution info [error...
> @sun3book 看起来是绑定vfio驱动失败了,部署后宿主机是否重启过?gpu是否有其他的驱动? 宿主机器有重启过,没有其它驱动,这里 有点像是 核显 和 独显 冲突了
日志显示在透传集显 [info 2024-02-06 10:32:07 isolated_device.(*isolatedDeviceManager).probeCustomPCIDevs(isolated_device.go:184)] Add general pci device: 0 => &isolated_device.sGeneralPCIDevice{sBaseDevice:(*isolated_device.sBaseDevice)(0xc001524640)} [info 2024-02-06 10:32:07 isolated_device.getPassthroughGPUS(gpu.go:75)] filter address [01:00.0] [info 2024-02-06 10:32:07 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:307)] PCI address 00:02.0 is boot_vga: /sys/devices/pci0000:00/0000:00:02.0/boot_vga [info...
> /proc/cmdline root@zhcx-cloudpods-worker01:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-5.15.0-92-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro systemd.unified_cgroup_hierarchy=0 hugepagesz=1G default_hugepagesz=1G
> > vfio_iommu_type1.allow_unsafe_interrupts=1 intel_iommu=on quiet iommu=pt nouveau.modeset=0 > > 在 grub 中添加一下这些参数重启一下虚机试试 vfio_iommu_type1.allow_unsafe_interrupts=1 intel_iommu=on quiet iommu=pt nouveau.modeset=0 `nano /etc/default/grub` GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" `update-grub` `echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules` `echo "blacklist...
systemctl show --property=Environment kubelet | cat Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cgroup-driver=systemd" KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml 修改kubelet的Cgroup Driver 修改/etc/systemd/system/kubelet.service.d/10-kubeadm.conf文件,增加–cgroup-driver=systemd (官方推荐用systemd) Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cgroup-driver=systemd" systemctl daemon-reload systemctl restart kubelet 执行以上操作也未能解决。 报错信息如下: Feb 07 02:54:29 zhcx-cloudpods-worker01 systemd[1]: kubelet.service:...
/etc/default/grub 中 GRUB_CMDLINE_LINUX 增加 cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 swapaccount=1 systemd.unified_cgroup_hierarchy=0 ,服务全部都能正常启动了,但是节点状态还是未知 
> kubectl logs -n onecloud $(kubectl get pods -n onecloud |grep region |grep -v dns |awk '{print $1}') --tail 20 -f |grep error [error 2023-08-09 08:49:18 models.SyncCloudaccountResources(cloudsync.go:2476)] Sync project for...
> 1.打开region的日志 kubectl logs -n onecloud $(kubectl get pods -n onecloud |grep region |grep -v dns |awk '{print $1}') --tail 20 -f |grep error 2.VMware的云账号,点击全量同步 3.查看步骤1中是否有error信息 kubectl logs -n onecloud...