containerized-data-importer icon indicating copy to clipboard operation
containerized-data-importer copied to clipboard

tar extraction fails when tarfile has relative links

Open dropte opened this issue 2 years ago • 13 comments

What happened: When extracting a Tarball with top level relative links, extraction will fail. Chaging the tarball to contain only absolute links makes it succeed.

What you expected to happen: DataVolume should be created and return success

How to reproduce it (as minimally and precisely as possible): Steps to reproduce the behavior. Create a tarball from a directory:

mkdir example
touch example/example
tar -cf example.tar -C example .
cd example
tar -cf ../example_norel.tar  *
cd ..

Host the files via http. Create DataVolumes:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: import-archive-datavolume-rel
spec:
  source:
      http:
         url: "https://webhost/example.tar" 
  contentType: archive
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 100Mi
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: import-archive-datavolume-norel
spec:
  source:
      http:
         url: "https://webhost/example_norel.tar" 
  contentType: archive
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 100Mi

output from importer for rel:

I1113 14:07:31.425317       1 importer.go:103] Starting importer
I1113 14:07:31.425360       1 importer.go:172] begin import process
I1113 14:07:31.895947       1 data-processor.go:356] Calculating available size
I1113 14:07:31.895972       1 data-processor.go:368] Checking out file system volume size.
I1113 14:07:31.895987       1 data-processor.go:376] Request image size not empty.
I1113 14:07:31.895998       1 data-processor.go:381] Target size 96112640.
I1113 14:07:31.896031       1 data-processor.go:255] New phase: TransferDataDir
I1113 14:07:31.896045       1 util.go:207] begin untar to /data...
I1113 14:07:31.896050       1 util.go:213] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E1113 14:07:31.897677       1 util.go:222] exit status 2
E1113 14:07:31.897695       1 data-processor.go:251] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
	pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594
E1113 14:07:31.897773       1 importer.go:181] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
	pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594
kubectl get DataVolume

NAME                              PHASE              PROGRESS   RESTARTS   AGE
import-archive-datavolume-norel   Succeeded          100.0%                3m33s
import-archive-datavolume-rel     ImportInProgress   N/A        5          3m25s
kubectl describe DataVolume
Name:         import-archive-datavolume-norel
Namespace:    default
Labels:       <none>
Annotations:  cdi.kubevirt.io/storage.usePopulator: true
API Version:  cdi.kubevirt.io/v1beta1
Kind:         DataVolume
Metadata:
  Creation Timestamp:  2023-11-13T14:05:41Z
  Generation:          1
  Managed Fields:
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cdi.kubevirt.io/storage.usePopulator:
    Manager:      cdi-controller
    Operation:    Update
    Time:         2023-11-13T14:05:41Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:contentType:
        f:pvc:
          .:
          f:accessModes:
          f:resources:
            .:
            f:requests:
              .:
              f:storage:
        f:source:
          .:
          f:http:
            .:
            f:url:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-11-13T14:05:41Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:claimName:
        f:conditions:
        f:phase:
        f:progress:
    Manager:         cdi-controller
    Operation:       Update
    Subresource:     status
    Time:            2023-11-13T14:06:26Z
  Resource Version:  8338299
  UID:               46d5dea5-8a3e-425b-af40-8150244423da
Spec:
  Content Type:  archive
  Pvc:
    Access Modes:
      ReadWriteOnce
    Resources:
      Requests:
        Storage:  100Mi
  Source:
    Http:
      URL:  https://webhost/example_norel.tar
Status:
  Claim Name:  import-archive-datavolume-norel
  Conditions:
    Last Heartbeat Time:   2023-11-13T14:06:26Z
    Last Transition Time:  2023-11-13T14:06:26Z
    Message:               PVC import-archive-datavolume-norel Bound
    Reason:                Bound
    Status:                True
    Type:                  Bound
    Last Heartbeat Time:   2023-11-13T14:06:26Z
    Last Transition Time:  2023-11-13T14:06:26Z
    Status:                True
    Type:                  Ready
    Last Heartbeat Time:   2023-11-13T14:06:26Z
    Last Transition Time:  2023-11-13T14:06:26Z
    Message:               Import Complete
    Reason:                Completed
    Status:                False
    Type:                  Running
  Phase:                   Succeeded
  Progress:                100.0%
Events:
  Type    Reason            Age    From                          Message
  ----    ------            ----   ----                          -------
  Normal  Pending           4m8s   datavolume-import-controller  PVC import-archive-datavolume-norel Pending
  Normal  ImportInProgress  3m23s  datavolume-import-controller  Import into import-archive-datavolume-norel in progress
  Normal  ImportSucceeded   3m23s  datavolume-import-controller  Successfully imported into PVC import-archive-datavolume-norel
  Normal  Bound             3m23s  datavolume-import-controller  PVC import-archive-datavolume-norel Bound


Name:         import-archive-datavolume-rel
Namespace:    default
Labels:       <none>
Annotations:  cdi.kubevirt.io/storage.usePopulator: true
API Version:  cdi.kubevirt.io/v1beta1
Kind:         DataVolume
Metadata:
  Creation Timestamp:  2023-11-13T14:05:49Z
  Generation:          1
  Managed Fields:
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cdi.kubevirt.io/storage.usePopulator:
    Manager:      cdi-controller
    Operation:    Update
    Time:         2023-11-13T14:05:49Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:contentType:
        f:pvc:
          .:
          f:accessModes:
          f:resources:
            .:
            f:requests:
              .:
              f:storage:
        f:source:
          .:
          f:http:
            .:
            f:url:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-11-13T14:05:49Z
    API Version:  cdi.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:claimName:
        f:conditions:
        f:phase:
        f:progress:
        f:restartCount:
    Manager:         cdi-controller
    Operation:       Update
    Subresource:     status
    Time:            2023-11-13T14:08:58Z
  Resource Version:  8351981
  UID:               1c970368-e14b-4132-ba77-31b3a3ee70f6
Spec:
  Content Type:  archive
  Pvc:
    Access Modes:
      ReadWriteOnce
    Resources:
      Requests:
        Storage:  100Mi
  Source:
    Http:
      URL:  https://webhost/example.tar
Status:
  Claim Name:  import-archive-datavolume-rel
  Conditions:
    Last Heartbeat Time:   2023-11-13T14:05:49Z
    Last Transition Time:  2023-11-13T14:05:49Z
    Message:               PVC import-archive-datavolume-rel Pending
    Reason:                Pending
    Status:                False
    Type:                  Bound
    Last Heartbeat Time:   2023-11-13T14:08:58Z
    Last Transition Time:  2023-11-13T14:05:49Z
    Status:                False
    Type:                  Ready
    Last Heartbeat Time:   2023-11-13T14:08:58Z
    Last Transition Time:  2023-11-13T14:08:58Z
    Message:               Unable to process data: Unable to transfer source data to target directory: unable to untar files from endpoint: exit status 2
    Reason:                Error
    Status:                False
    Type:                  Running
  Phase:                   ImportInProgress
  Progress:                N/A
  Restart Count:           5
Events:
  Type     Reason            Age                  From                          Message
  ----     ------            ----                 ----                          -------
  Normal   Pending           4m1s                 datavolume-import-controller  PVC import-archive-datavolume-rel Pending
  Normal   ImportInProgress  3m24s                datavolume-import-controller  Import into import-archive-datavolume-rel in progress
  Warning  Error             52s (x5 over 3m24s)  datavolume-import-controller  Unable to process data: Unable to transfer source data to target directory: unable to untar files from endpoint: exit status 2

Additional context: Using Rook-Ceph as the storage provider, but can reproduce running importer container on local storage via docker. Seems to have worked ok in 1.52.0. Possibly related to permissions running as non-root user in container.

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): 1.57.0
  • Kubernetes version (use kubectl version): v1.26.8
  • DV specification: Provided above
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.1 LTS
  • Kernel (e.g. uname -a): Linux e8451be7-6ce0-4581-8d7b-4ecff846abde-jx84w-pool3-f873b2ad-cxwds 5.15.0-1017-aws #21-Ubuntu SMP Fri Aug 5 11:10:45 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: N/A
  • Others: N/A

dropte avatar Nov 13 '23 14:11 dropte

Hey, thanks for reporting this!

I think if you increase the CDI log verbosity with something like

kubectl set env deployment cdi-operator \
        --namespace="${cdi_namespace}" \
        --containers='cdi-operator' \
        VERBOSITY="3"

We should get the actual stdout/stderr of the untar command

akalenyu avatar Nov 14 '23 10:11 akalenyu

Lightly redacted output:

I1114 14:05:16.309950       1 importer.go:103] Starting importer
I1114 14:05:16.309997       1 importer.go:172] begin import process
I1114 14:05:16.310038       1 http-datasource.go:392] Attempting to HEAD "https://<url>/example.tar" via http client
I1114 14:05:16.598432       1 http-datasource.go:424] Content length: 2048
I1114 14:05:16.598446       1 http-datasource.go:327] Attempting to get object "https://<url>/example.tar" via http client
I1114 14:05:16.671028       1 data-processor.go:356] Calculating available size
I1114 14:05:16.671075       1 data-processor.go:368] Checking out file system volume size.
I1114 14:05:16.671103       1 data-processor.go:376] Request image size not empty.
I1114 14:05:16.671114       1 data-processor.go:381] Target size 96112640.
I1114 14:05:16.671149       1 format-readers.go:112] constructReaders: checking compression and archive formats
I1114 14:05:16.671163       1 format-readers.go:121] found header of type "tar"
I1114 14:05:16.671171       1 data-processor.go:255] New phase: TransferDataDir
I1114 14:05:16.671180       1 util.go:207] begin untar to /data...
I1114 14:05:16.671187       1 util.go:213] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
I1114 14:05:16.672780       1 util.go:220] STDOUT
./
./example

I1114 14:05:16.672787       1 util.go:221] STDERR
/usr/bin/tar: .: Cannot utime: Operation not permitted
/usr/bin/tar: .: Cannot change mode to rwxr-xr-x: Operation not permitted
/usr/bin/tar: Exiting with failure status due to previous errors

E1114 14:05:16.672793       1 util.go:222] exit status 2
E1114 14:05:16.672806       1 data-processor.go:251] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
	pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594
E1114 14:05:16.672884       1 importer.go:181] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
	pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
	pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	pkg/importer/data-processor.go:157
main.handleImport
	cmd/cdi-importer/importer.go:178
main.main
	cmd/cdi-importer/importer.go:144
runtime.main
	GOROOT/src/runtime/proc.go:250
runtime.goexit
	GOROOT/src/runtime/asm_amd64.s:1594

Note that this is on nodes with the device_ownership_from_security_context set to true at the containerd level.

dropte avatar Nov 14 '23 14:11 dropte

I see. Maybe as non-root it would make sense for us to use these

-m, --touch
    Don't extract file modified time.

--no-overwrite-dir
    Preserve metadata of existing directories.

akalenyu avatar Nov 14 '23 15:11 akalenyu

/assign akalenyu

akalenyu avatar Nov 20 '23 13:11 akalenyu

Is this still an issue for you?

aglitke avatar Dec 18 '23 13:12 aglitke

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Mar 17 '24 14:03 kubevirt-bot

/remove-lifecycle stale

akalenyu avatar Mar 17 '24 14:03 akalenyu

I've encountered the same issue.

Content of TAR that causes the error:

$ tar -tv --numeric-owner -f archive.tar 
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 ./
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 ./blah/
-rw-r--r-- 2009/2000        12 2024-04-05 05:33 ./blah/README
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 ./foo/

Content of the TAR that doesn't cause an error:

$ tar -tv --numeric-owner -f archive2.tar 
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 blah/
-rw-r--r-- 2009/2000        12 2024-04-05 05:33 blah/README
drwxr-xr-x 2009/2000         0 2024-04-05 05:33 foo/

ianb-mp avatar Apr 08 '24 01:04 ianb-mp

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Jul 07 '24 02:07 kubevirt-bot

/remove-lifecycle stale issue definitely still around

akalenyu avatar Jul 15 '24 12:07 akalenyu

Issue isn't limited to archives containing links. The parameters used by CDI when calling tar won't work with all PVC/StorageClass and Pod security context due to ownership. It looks like the current importer Pod is tailored for importing disk images meant to be consumed by libvirt/qemu. But maybe that's a limiting factor when just trying to import an archive with random files to be mounted with VirtIO disk and not as a VM disk.

tux-o-matic avatar Oct 10 '24 12:10 tux-o-matic

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Jan 08 '25 13:01 kubevirt-bot

/lifecycle frozen

awels avatar Jan 13 '25 14:01 awels