azurefile-csi-driver icon indicating copy to clipboard operation
azurefile-csi-driver copied to clipboard

Install Windows csi-proxy drivers via helm chart?

Open jackfrancis opened this issue 3 years ago • 7 comments

Is your feature request related to a problem?/Why is this needed

If you install the canonical helm chart of azurefile-csi onto a cluster that has Windows nodes, the Windows node pods fail.

It seems that we solved this in AKS Engine by attaching a script that installs csi-proxy to the VM provisioning PowerShell. That doesn't seem like a very user-friendly approach.

Describe the solution you'd like in detail

Rather than have to separately install Windows csi-proxy drivers, can we deliver this installation via a daemonset that is delivered via the helm chart so we can have a one-step E2E solution?

Describe alternatives you've considered

Additional context

Errors here:

$ k logs csi-azurefile-node-win-n624x -c azurefile
I0701 08:39:41.616151    5596 main.go:111] set up prometheus server on [::]:29615
I0701 08:39:41.863988    5596 azurefile.go:279] 
DRIVER INFORMATION:
-------------------
Build Date: "2022-06-22T06:13:20Z"
Compiler: gc
Driver Name: file.csi.azure.com
Driver Version: v1.20.0
Git Commit: e38436385ceca271d8d9023d6ff155b4de4c924a
Go Version: go1.18.3
Platform: windows/amd64

Streaming logs below:
I0701 08:39:41.863988    5596 azurefile.go:282] driver userAgent: file.csi.azure.com/v1.20.0 gc/go1.18.3 (amd64-windows) OSS-helm
I0701 08:39:41.898994    5596 azure.go:71] reading cloud config from secret kube-system/azure-cloud-provider
I0701 08:39:41.942234    5596 azure.go:78] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found
I0701 08:39:41.942234    5596 azure.go:83] could not read cloud config from secret kube-system/azure-cloud-provider
I0701 08:39:41.942234    5596 azure.go:93] use default AZURE_CREDENTIAL_FILE env var: C:\k\azure.json
I0701 08:39:41.942234    5596 azure.go:101] read cloud config from file: C:\k\azure.json successfully
I0701 08:39:41.958406    5596 azure_auth.go:245] Using AzurePublicCloud environment
I0701 08:39:41.958406    5596 azure_auth.go:130] azure: using client_id+client_secret to retrieve access token
I0701 08:39:41.958406    5596 azure_diskclient.go:68] Azure DisksClient using API version: 2021-04-01
I0701 08:39:41.958406    5596 azure.go:997] attach/detach disk operation rate limit QPS: 6.000000, Bucket: 10
I0701 08:39:41.958406    5596 azure.go:136] starting node server on node(win-p-win000001)
I0701 08:39:41.958406    5596 azurefile.go:287] cloud: AzurePublicCloud, location: westus3, rg: capz-e2e-2r8as9-vmss, VnetName: capz-e2e-2r8as9-vmss-vnet, VnetResourceGroup: capz-e2e-2r8as9-vmss, SubnetName: node-subnet
I0701 08:39:41.958406    5596 safe_mounter_windows.go:300] failed to connect to csi-proxy v1 with error: open \\.\\pipe\\csi-proxy-filesystem-v1: The system cannot find the file specified., will try with v1Beta
E0701 08:39:41.963974    5596 safe_mounter_windows.go:310] failed to connect to csi-proxy v1beta with error: open \\.\\pipe\\csi-proxy-filesystem-v1beta1: The system cannot find the file specified.
F0701 08:39:41.963974    5596 azurefile.go:294] Failed to get safe mounter. Error: open \\.\\pipe\\csi-proxy-filesystem-v1beta1: The system cannot find the file specified.
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/k8s.io/klog/v2/klog.go:860 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x38f8fa0, 0x3, 0x0, 0xc000119b20, 0x1, {0x2fd8b5c?, 0x1?}, 0xc000060800?, 0x0)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/k8s.io/klog/v2/klog.go:825 +0x686
k8s.io/klog/v2.(*loggingT).printfDepth(0x38f8fa0, 0x3a129?, 0x0, {0x0, 0x0}, 0xc000034470?, {0x28f4220, 0x25}, {0xc00025d340, 0x1, ...})
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/k8s.io/klog/v2/klog.go:630 +0x1f2
k8s.io/klog/v2.(*loggingT).printf(...)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/k8s.io/klog/v2/klog.go:612
k8s.io/klog/v2.Fatalf(...)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/k8s.io/klog/v2/klog.go:1516
sigs.k8s.io/azurefile-csi-driver/pkg/azurefile.(*Driver).Run(0xc0000fe780, {0xc0000343fb, 0x18}, {0xc000046102, 0x0}, 0x1e?)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefile/azurefile.go:294 +0x7d4
main.handle()
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefileplugin/main.go:97 +0x1ee
main.main()
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefileplugin/main.go:74 +0x1da

goroutine 15 [IO wait]:
internal/poll.runtime_pollWait(0x1954911bfa8, 0x72)
	/usr/local/go/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0x23?, 0xc000137b40?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.execIO(0xc00044ea18, 0xc000137b48)
	/usr/local/go/src/internal/poll/fd_windows.go:175 +0xe5
internal/poll.(*FD).acceptOne(0xc00044ea00, 0x240, {0xc000166000?, 0x8?, 0x19549005ca8?}, 0xd0?)
	/usr/local/go/src/internal/poll/fd_windows.go:942 +0x6d
internal/poll.(*FD).Accept(0xc00044ea00, 0xc000137d20)
	/usr/local/go/src/internal/poll/fd_windows.go:976 +0x1d6
net.(*netFD).accept(0xc00044ea00)
	/usr/local/go/src/net/fd_windows.go:139 +0x65
net.(*TCPListener).accept(0xc00036c240)
	/usr/local/go/src/net/tcpsock_posix.go:139 +0x28
net.(*TCPListener).Accept(0xc00036c240)
	/usr/local/go/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc0001521c0, {0x2b7a560, 0xc00036c240})
	/usr/local/go/src/net/http/server.go:3039 +0x385
net/http.Serve(...)
	/usr/local/go/src/net/http/server.go:2543
main.serveMetrics({0x2b7a560, 0xc00036c240})
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefileplugin/main.go:123 +0xa5
main.serve.func1()
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefileplugin/main.go:114 +0xa8
created by main.serve
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefileplugin/main.go:112 +0x1aa

goroutine 36 [IO wait]:
internal/poll.runtime_pollWait(0x1954911beb8, 0x72)
	/usr/local/go/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc0004f0420?, 0x2c?, 0x0)
	/usr/local/go/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.execIO(0xc00044f698, 0x29b9c48)
	/usr/local/go/src/internal/poll/fd_windows.go:175 +0xe5
internal/poll.(*FD).Read(0xc00044f680, {0xc000341500, 0x937, 0x937})
	/usr/local/go/src/internal/poll/fd_windows.go:441 +0x25f
net.(*netFD).Read(0xc00044f680, {0xc000341500?, 0xc00041b520?, 0xc000341613?})
	/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc00038c890, {0xc000341500?, 0x7fffe3007fffe2?, 0x1fffdc007fffe4?})
	/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0000048a0, {0xc000341500?, 0x0?, 0xfffff0003fffde?})
	/usr/local/go/src/crypto/tls/conn.go:785 +0x3d
bytes.(*Buffer).ReadFrom(0xc000092278, {0x2b61e40, 0xc0000048a0})
	/usr/local/go/src/bytes/buffer.go:204 +0x98
crypto/tls.(*Conn).readFromUntil(0xc000092000, {0x1954906a8d8?, 0xc00038c890}, 0x829?)
	/usr/local/go/src/crypto/tls/conn.go:807 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc000092000, 0x0)
	/usr/local/go/src/crypto/tls/conn.go:614 +0x116
crypto/tls.(*Conn).readRecord(...)
	/usr/local/go/src/crypto/tls/conn.go:582
crypto/tls.(*Conn).Read(0xc000092000, {0xc000247000, 0x1000, 0x17616e0?})
	/usr/local/go/src/crypto/tls/conn.go:1285 +0x16f
bufio.(*Reader).Read(0xc000238480, {0xc0002303c0, 0x9, 0x176f9c2?})
	/usr/local/go/src/bufio/bufio.go:236 +0x1b4
io.ReadAtLeast({0x2b61ca0, 0xc000238480}, {0xc0002303c0, 0x9, 0x9}, 0x9)
	/usr/local/go/src/io/io.go:331 +0x9a
io.ReadFull(...)
	/usr/local/go/src/io/io.go:350
golang.org/x/net/http2.readFrameHeader({0xc0002303c0?, 0x9?, 0xc00138f920?}, {0x2b61ca0?, 0xc000238480?})
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/golang.org/x/net/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc000230380)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/golang.org/x/net/http2/frame.go:498 +0x95
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc000317f98)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/golang.org/x/net/http2/transport.go:2118 +0x130
golang.org/x/net/http2.(*ClientConn).readLoop(0xc0000fed80)
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/golang.org/x/net/http2/transport.go:2014 +0x6f
created by golang.org/x/net/http2.(*Transport).newClientConn
	/root/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/golang.org/x/net/http2/transport.go:725 +0xa65

jackfrancis avatar Jul 01 '22 08:07 jackfrancis

cc @jsturtevant @marosset @sonasingh46

jackfrancis avatar Jul 01 '22 08:07 jackfrancis

running following command should work, I think we could integrate with helm chart:

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/cluster-api-provider-azure/main/templates/addons/windows/csi-proxy/csi-proxy.yaml

andyzhangx avatar Jul 01 '22 11:07 andyzhangx

Tested the above locally and it worked fine. I'll submit a PR to integrate that DaemonSet into the azurefile-csi helm chart, thanks!

jackfrancis avatar Jul 01 '22 11:07 jackfrancis

/assign

jackfrancis avatar Jul 01 '22 11:07 jackfrancis

@jackfrancis I started this work with https://github.com/kubernetes-sigs/azurefile-csi-driver/pull/936/files I ran into some issues with e2e tests running in CAPZ on windows nodes that I believe is unrelated to how the CSI drivers get installed.

marosset avatar Jul 05 '22 17:07 marosset

@marosset awesome!

/assign @marosset

:)

jackfrancis avatar Jul 05 '22 20:07 jackfrancis

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 03 '22 20:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 02 '22 20:11 k8s-triage-robot

@marosset should we re-open the stale PR, and mark this issue as not stale?

jackfrancis avatar Nov 04 '22 18:11 jackfrancis

@marosset should we re-open the stale PR, and mark this issue as not stale?

I need to take a closer look at https://github.com/kubernetes/enhancements/issues/3636 That work might supersede my old PR here.

marosset avatar Nov 04 '22 18:11 marosset

we don't need csi-proxy in the near future, close this issue.

andyzhangx avatar Nov 30 '22 12:11 andyzhangx