containerized-data-importer icon indicating copy to clipboard operation
containerized-data-importer copied to clipboard

Host assisted clone failing from Block->Filesystem with azure-disk

Open 0xFelix opened this issue 2 years ago • 5 comments

What happened: I tried to clone a DV/PVC with volumeMode: Block to a new DV/PVC with volumeMode: Filesystem.

The cloning failed shortly before reaching 100%.

Logs of the cloner pod:

I0502 12:46:47.115723       8 prometheus.go:72] 98.18
F0502 12:46:47.416836       8 clone-source.go:234] Unexpected status code 500
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
        /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1038 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x13ee300, 0x3, 0x0, 0xc000293880, 0x0, {0xf84bb9, 0x1}, 0xc000368e30, 0x0)
        /remote-source/app/vendor/k8s.io/klog/v2/klog.go:987 +0x5fd
k8s.io/klog/v2.(*loggingT).printf(0x12, 0x4000b, 0x0, {0x0, 0x0}, {0xcfba52, 0x19}, {0xc000368e30, 0x1, 0x1})
        /remote-source/app/vendor/k8s.io/klog/v2/klog.go:753 +0x1c5
k8s.io/klog/v2.Fatalf(...)
        /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1532
main.main()
        /remote-source/app/cmd/cdi-cloner/clone-source.go:234 +0xc05

goroutine 3 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0xc000112060)
        /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1181 +0x6a
created by k8s.io/klog/v2.init.0
        /remote-source/app/vendor/k8s.io/klog/v2/klog.go:420 +0xfb

goroutine 5 [sleep]:
time.Sleep(0x3b9aca00)
        /usr/lib/golang/src/runtime/time.go:193 +0x12e
kubevirt.io/containerized-data-importer/pkg/util/prometheus.(*ProgressReader).timedUpdateProgress(0xc000088cc0)
        /remote-source/app/pkg/util/prometheus/prometheus.go:55 +0x2c
created by kubevirt.io/containerized-data-importer/pkg/util/prometheus.(*ProgressReader).StartTimedUpdate
        /remote-source/app/pkg/util/prometheus/prometheus.go:48 +0x5b

goroutine 6 [select]:
io.(*pipe).Write(0xc000088d20, {0xc00030000a, 0xc0d, 0x12ad2})
        /usr/lib/golang/src/io/pipe.go:94 +0x21a
io.(*PipeWriter).Write(0xc000300012, {0xc00030000a, 0xc0002ee000, 0xc0002ee000})
        /usr/lib/golang/src/io/pipe.go:163 +0x25
github.com/golang/snappy.(*Writer).write(0xc000088d80, {0xc0002ee000, 0xb70272, 0xc0000103f0})
        /remote-source/app/vendor/github.com/golang/snappy/encode.go:253 +0x347
github.com/golang/snappy.(*Writer).Flush(0xc000088d80)
        /remote-source/app/vendor/github.com/golang/snappy/encode.go:276 +0x4e
github.com/golang/snappy.(*Writer).Write(0xc000088d80, {0xc00013c000, 0x8000, 0x8000})
        /remote-source/app/vendor/github.com/golang/snappy/encode.go:197 +0x112
io.copyBuffer({0xe087e0, 0xc000088d80}, {0x7fa528cff008, 0xc0000c4870}, {0x0, 0x0, 0x0})
        /usr/lib/golang/src/io/io.go:425 +0x204
io.Copy(...)
        /usr/lib/golang/src/io/io.go:382
main.pipeToSnappy.func1()
        /remote-source/app/cmd/cdi-cloner/clone-source.go:117 +0x6b
created by main.pipeToSnappy
        /remote-source/app/cmd/cdi-cloner/clone-source.go:116 +0x287

goroutine 10 [IO wait, 6 minutes]:
internal/poll.runtime_pollWait(0x7fa50015eeb8, 0x72)
        /usr/lib/golang/src/runtime/netpoll.go:234 +0x89
internal/poll.(*pollDesc).wait(0xc00012a080, 0x0, 0x0)
        /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
        /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00012a080)
        /usr/lib/golang/src/internal/poll/fd_unix.go:402 +0x22c
net.(*netFD).accept(0xc00012a080)
        /usr/lib/golang/src/net/fd_unix.go:173 +0x35
net.(*TCPListener).accept(0xc00019e390)
        /usr/lib/golang/src/net/tcpsock_posix.go:140 +0x28
net.(*TCPListener).Accept(0xc00019e390)
        /usr/lib/golang/src/net/tcpsock.go:262 +0x3d
crypto/tls.(*listener).Accept(0xc00019e450)
        /usr/lib/golang/src/crypto/tls/tls.go:66 +0x2d
net/http.(*Server).Serve(0xc000504000, {0xe15e68, 0xc00019e450})
        /usr/lib/golang/src/net/http/server.go:3002 +0x394
net/http.(*Server).ServeTLS(0xc000504000, {0xe196a8, 0xc00019e390}, {0xc000046220, 0xe0}, {0xc0000463a0, 0x1f})
        /usr/lib/golang/src/net/http/server.go:3074 +0x3fe
net/http.(*Server).ListenAndServeTLS(0xc000504000, {0xc000046220, 0x1f}, {0xc0000463a0, 0x1f})
        /usr/lib/golang/src/net/http/server.go:3229 +0x12f
net/http.ListenAndServeTLS(...)
        /usr/lib/golang/src/net/http/server.go:3195
kubevirt.io/containerized-data-importer/pkg/util/prometheus.StartPrometheusEndpoint.func1()
        /remote-source/app/pkg/util/prometheus/prometheus.go:113 +0xa5
created by kubevirt.io/containerized-data-importer/pkg/util/prometheus.StartPrometheusEndpoint
        /remote-source/app/pkg/util/prometheus/prometheus.go:111 +0x245

goroutine 71 [IO wait]:
internal/poll.runtime_pollWait(0x7fa50015ece8, 0x72)
        /usr/lib/golang/src/runtime/netpoll.go:234 +0x89
internal/poll.(*pollDesc).wait(0xc0000fa180, 0xc000672000, 0x0)
        /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
        /usr/lib/golang/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0000fa180, {0xc000672000, 0x205, 0x205})
        /usr/lib/golang/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc0000fa180, {0xc000672000, 0x400000000, 0x4})
        /usr/lib/golang/src/net/fd_posix.go:56 +0x29
net.(*conn).Read(0xc0005b6008, {0xc000672000, 0x600000004, 0xc00051f258})
        /usr/lib/golang/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0004c9230, {0xc000672000, 0x0, 0x40fcad})
        /usr/lib/golang/src/crypto/tls/conn.go:777 +0x3d
bytes.(*Buffer).ReadFrom(0xc0002ea978, {0xe08580, 0xc0004c9230})
        /usr/lib/golang/src/bytes/buffer.go:204 +0x98
crypto/tls.(*Conn).readFromUntil(0xc0002ea700, {0xe095c0, 0xc0005b6008}, 0x0)
        /usr/lib/golang/src/crypto/tls/conn.go:799 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc0002ea700, 0x0)
        /usr/lib/golang/src/crypto/tls/conn.go:606 +0x112
crypto/tls.(*Conn).readRecord(...)
        /usr/lib/golang/src/crypto/tls/conn.go:574
crypto/tls.(*Conn).Read(0xc0002ea700, {0xc000291000, 0x1000, 0x6ca})
        /usr/lib/golang/src/crypto/tls/conn.go:1277 +0x16f
net/http.(*connReader).Read(0xc000193680, {0xc000291000, 0x1000, 0x1000})
        /usr/lib/golang/src/net/http/server.go:780 +0x16d
bufio.(*Reader).fill(0xc0003d4720)
        /usr/lib/golang/src/bufio/bufio.go:101 +0x103
bufio.(*Reader).ReadSlice(0xc0003d4720, 0xb8)
        /usr/lib/golang/src/bufio/bufio.go:360 +0x2f
bufio.(*Reader).ReadLine(0xc0003d4720)
        /usr/lib/golang/src/bufio/bufio.go:389 +0x27
net/textproto.(*Reader).readLineSlice(0xc0006a6450)
        /usr/lib/golang/src/net/textproto/reader.go:57 +0x99
net/textproto.(*Reader).ReadLine(...)
        /usr/lib/golang/src/net/textproto/reader.go:38
net/http.readRequest(0xc000504504)
        /usr/lib/golang/src/net/http/request.go:1029 +0x79
net/http.(*conn).readRequest(0xc0000c6000, {0xe1d270, 0xc0006cb100})
        /usr/lib/golang/src/net/http/server.go:966 +0x225
net/http.(*conn).serve(0xc0000c6000, {0xe1d318, 0xc0002940f0})
        /usr/lib/golang/src/net/http/server.go:1856 +0x865
created by net/http.(*Server).Serve
        /usr/lib/golang/src/net/http/server.go:3034 +0x4e8

goroutine 69 [semacquire]:
internal/poll.runtime_Semacquire(0x7fa50015edd0)
        /usr/lib/golang/src/runtime/sema.go:61 +0x25
internal/poll.(*FD).Close(0xc0000fa800)
        /usr/lib/golang/src/internal/poll/fd_unix.go:116 +0x6d
net.(*netFD).Close(0xc0000fa800)
        /usr/lib/golang/src/net/fd_posix.go:38 +0x38
net.(*conn).Close(0xc000010130)
        /usr/lib/golang/src/net/net.go:207 +0x45
crypto/tls.(*Conn).Close(0x5ee3339d00000001)
        /usr/lib/golang/src/crypto/tls/conn.go:1326 +0xfd
net/http.(*persistConn).closeLocked(0xc00009a5a0, {0xe086e0, 0xc000190bf0})
        /usr/lib/golang/src/net/http/transport.go:2708 +0x130
net/http.(*persistConn).close(0xc000590058, {0xe086e0, 0xc000190bf0})
        /usr/lib/golang/src/net/http/transport.go:2693 +0xb0
net/http.(*persistConn).readLoop.func1()
        /usr/lib/golang/src/net/http/transport.go:2055 +0x2d
net/http.(*persistConn).readLoop(0xc00009a5a0)
        /usr/lib/golang/src/net/http/transport.go:2230 +0xaac
created by net/http.(*Transport).dialConn
        /usr/lib/golang/src/net/http/transport.go:1747 +0x1e05

goroutine 70 [runnable]:
syscall.Syscall(0x3, 0x7, 0x0, 0x0)
        /usr/lib/golang/src/syscall/asm_linux_amd64.s:20 +0x5
syscall.Close(0x7fa50015edd0)
        /usr/lib/golang/src/syscall/zsyscall_linux_amd64.go:286 +0x30
internal/poll.(*FD).destroy(0xc0000fa800)
        /usr/lib/golang/src/internal/poll/fd_unix.go:84 +0x51
internal/poll.(*FD).writeUnlock(0xc0000fa800)
        /usr/lib/golang/src/internal/poll/fd_mutex.go:250 +0x33
internal/poll.(*FD).Write(0xc0000fa800, {0xc0005ec000, 0xc2a, 0x1000})
        /usr/lib/golang/src/internal/poll/fd_unix.go:288 +0x404
net.(*netFD).Write(0xc0000fa800, {0xc0005ec000, 0xe12ee8, 0xc536e0})
        /usr/lib/golang/src/net/fd_posix.go:74 +0x29
net.(*conn).Write(0xc000010130, {0xc0005ec000, 0x0, 0x0})
        /usr/lib/golang/src/net/net.go:195 +0x45
crypto/tls.(*Conn).write(0xc0002ea380, {0xc0005ec000, 0x5, 0x1000})
        /usr/lib/golang/src/crypto/tls/conn.go:912 +0x108
crypto/tls.(*Conn).writeRecordLocked(0xc0002ea380, 0x17, {0xc0000f0000, 0xc14, 0x1000})
        /usr/lib/golang/src/crypto/tls/conn.go:980 +0x351
crypto/tls.(*Conn).Write(0xcc8de0, {0xc0000f0000, 0xc14, 0x1000})
        /usr/lib/golang/src/crypto/tls/conn.go:1151 +0x405
net/http.persistConnWriter.Write({0xc000525aa0}, {0xc0000f0000, 0xc000010240, 0xced44a})
        /usr/lib/golang/src/net/http/transport.go:1763 +0x2f
bufio.(*Writer).Flush(0xc0000a9040)
        /usr/lib/golang/src/bufio/bufio.go:607 +0x62
net/http/internal.(*chunkedWriter).Write(0xc0001919b0, {0xc0003f2000, 0xc0d, 0x8000})
        /usr/lib/golang/src/net/http/internal/chunked.go:217 +0x10d
io.copyBuffer({0x7fa50007c0b0, 0xc0001919b0}, {0x7fa500076658, 0xc000010400}, {0x0, 0x0, 0x0})
        /usr/lib/golang/src/io/io.go:425 +0x204
io.Copy(...)
        /usr/lib/golang/src/io/io.go:382
net/http.(*transferWriter).doBodyCopy(0xc0000c68c0, {0x7fa50007c0b0, 0xc0001919b0}, {0x7fa500076658, 0xc000010400})
        /usr/lib/golang/src/net/http/transfer.go:410 +0x4d
net/http.(*transferWriter).writeBody(0xc0000c68c0, {0xe083e0, 0xc0000a9040})
        /usr/lib/golang/src/net/http/transfer.go:357 +0x225
net/http.(*Request).write(0xc0002e8200, {0xe083e0, 0xc0000a9040}, 0x0, 0xc0003e0600, 0x0)
        /usr/lib/golang/src/net/http/request.go:698 +0xb4e
net/http.(*persistConn).writeLoop(0xc00009a5a0)
        /usr/lib/golang/src/net/http/transport.go:2389 +0x189
created by net/http.(*Transport).dialConn
        /usr/lib/golang/src/net/http/transport.go:1748 +0x1e65

What you expected to happen: I expected the cloning to be successful.

How to reproduce it (as minimally and precisely as possible):

  1. Create source DV with volumeMode: Block
  2. Create destination DV with volumeMode: Filesystem and set source to pvc of source DV

Example source DV:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  generateName: source-dv
spec:
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 60Gi
    volumeMode: Block
  source:
    registry:
      url: quay.io/containerdisks/fedora:35

Example destination DV:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  generateName: destination-
  annotations:
    cdi.kubevirt.io/storage.bind.immediate.requested: "true"
spec:
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 60Gi
    volumeMode: Filesystem
  source:
    pvc:
      name: source-dv
      namespace: kubevirt

Additional context: When leaving out the volumeMode and accessModes on the destination DV the cloning completes successfully.

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): 4.10.0
  • Kubernetes version (use kubectl version): Openshift 4.10
  • DV specification: N/A
  • Cloud provider or hardware configuration: Azure (installed with openshift-install IPI)
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Others: N/A

0xFelix avatar May 02 '22 15:05 0xFelix

The most likely issue is that you ran out of storage space on the target (due to filesystem overhead). Can you try replacing pvc with storage in your DV spec? This should take file system overhead into account and pad the created PVC a bit.

awels avatar May 02 '22 15:05 awels

Sorry, that example is not 100% correct. I also tried it with storage instead of pvc, but it only worked when leaving out accessModes and volumeMode.

0xFelix avatar May 03 '22 06:05 0xFelix

So the point of storage is that the user doesn't need to know access modes and volume modes. We attempt to detect the optimal setting and populate that for the user. It should work perfectly fine if the user specifies those fields and it will use the ones provided by the user. But back to the issue, does it work if you make the target PVC like 5% larger than the source when going from block -> filesystem. Like I said, I suspect the Azure provisioner just makes a block device of the requested size, and then puts a filesystem on it, that means the actual available size in the pvc might not be large enough for the contents of the source block device.

awels avatar May 03 '22 12:05 awels

So the point of storage is that the user doesn't need to know access modes and volume modes. We attempt to detect the optimal setting and populate that for the user. It should work perfectly fine if the user specifies those fields and it will use the ones provided by the user. But back to the issue, does it work if you make the target PVC like 5% larger than the source when going from block -> filesystem. Like I said, I suspect the Azure provisioner just makes a block device of the requested size, and then puts a filesystem on it, that means the actual available size in the pvc might not be large enough for the contents of the source block device.

Sorry, that example is not 100% correct. I also tried it with storage instead of pvc, but it only worked when leaving out accessModes and volumeMode.

storage would have inflated the size of the underlying PVC though, but definitely worth to add some gigs manually to scratch the possibility off.

Also, I would try to eliminate the source DV being block from the equation and just go with fs->fs

akalenyu avatar May 03 '22 13:05 akalenyu

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Aug 01 '22 14:08 kubevirt-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot avatar Aug 31 '22 15:08 kubevirt-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot avatar Sep 30 '22 15:09 kubevirt-bot

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot avatar Sep 30 '22 15:09 kubevirt-bot