containerized-data-importer
containerized-data-importer copied to clipboard
tar extraction fails when tarfile has relative links
What happened: When extracting a Tarball with top level relative links, extraction will fail. Chaging the tarball to contain only absolute links makes it succeed.
What you expected to happen: DataVolume should be created and return success
How to reproduce it (as minimally and precisely as possible): Steps to reproduce the behavior. Create a tarball from a directory:
mkdir example
touch example/example
tar -cf example.tar -C example .
cd example
tar -cf ../example_norel.tar *
cd ..
Host the files via http. Create DataVolumes:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: import-archive-datavolume-rel
spec:
source:
http:
url: "https://webhost/example.tar"
contentType: archive
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: import-archive-datavolume-norel
spec:
source:
http:
url: "https://webhost/example_norel.tar"
contentType: archive
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
output from importer for rel:
I1113 14:07:31.425317 1 importer.go:103] Starting importer
I1113 14:07:31.425360 1 importer.go:172] begin import process
I1113 14:07:31.895947 1 data-processor.go:356] Calculating available size
I1113 14:07:31.895972 1 data-processor.go:368] Checking out file system volume size.
I1113 14:07:31.895987 1 data-processor.go:376] Request image size not empty.
I1113 14:07:31.895998 1 data-processor.go:381] Target size 96112640.
I1113 14:07:31.896031 1 data-processor.go:255] New phase: TransferDataDir
I1113 14:07:31.896045 1 util.go:207] begin untar to /data...
I1113 14:07:31.896050 1 util.go:213] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
E1113 14:07:31.897677 1 util.go:222] exit status 2
E1113 14:07:31.897695 1 data-processor.go:251] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
E1113 14:07:31.897773 1 importer.go:181] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
kubectl get DataVolume
NAME PHASE PROGRESS RESTARTS AGE
import-archive-datavolume-norel Succeeded 100.0% 3m33s
import-archive-datavolume-rel ImportInProgress N/A 5 3m25s
kubectl describe DataVolume
Name: import-archive-datavolume-norel
Namespace: default
Labels: <none>
Annotations: cdi.kubevirt.io/storage.usePopulator: true
API Version: cdi.kubevirt.io/v1beta1
Kind: DataVolume
Metadata:
Creation Timestamp: 2023-11-13T14:05:41Z
Generation: 1
Managed Fields:
API Version: cdi.kubevirt.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:cdi.kubevirt.io/storage.usePopulator:
Manager: cdi-controller
Operation: Update
Time: 2023-11-13T14:05:41Z
API Version: cdi.kubevirt.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:contentType:
f:pvc:
.:
f:accessModes:
f:resources:
.:
f:requests:
.:
f:storage:
f:source:
.:
f:http:
.:
f:url:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-11-13T14:05:41Z
API Version: cdi.kubevirt.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:claimName:
f:conditions:
f:phase:
f:progress:
Manager: cdi-controller
Operation: Update
Subresource: status
Time: 2023-11-13T14:06:26Z
Resource Version: 8338299
UID: 46d5dea5-8a3e-425b-af40-8150244423da
Spec:
Content Type: archive
Pvc:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 100Mi
Source:
Http:
URL: https://webhost/example_norel.tar
Status:
Claim Name: import-archive-datavolume-norel
Conditions:
Last Heartbeat Time: 2023-11-13T14:06:26Z
Last Transition Time: 2023-11-13T14:06:26Z
Message: PVC import-archive-datavolume-norel Bound
Reason: Bound
Status: True
Type: Bound
Last Heartbeat Time: 2023-11-13T14:06:26Z
Last Transition Time: 2023-11-13T14:06:26Z
Status: True
Type: Ready
Last Heartbeat Time: 2023-11-13T14:06:26Z
Last Transition Time: 2023-11-13T14:06:26Z
Message: Import Complete
Reason: Completed
Status: False
Type: Running
Phase: Succeeded
Progress: 100.0%
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pending 4m8s datavolume-import-controller PVC import-archive-datavolume-norel Pending
Normal ImportInProgress 3m23s datavolume-import-controller Import into import-archive-datavolume-norel in progress
Normal ImportSucceeded 3m23s datavolume-import-controller Successfully imported into PVC import-archive-datavolume-norel
Normal Bound 3m23s datavolume-import-controller PVC import-archive-datavolume-norel Bound
Name: import-archive-datavolume-rel
Namespace: default
Labels: <none>
Annotations: cdi.kubevirt.io/storage.usePopulator: true
API Version: cdi.kubevirt.io/v1beta1
Kind: DataVolume
Metadata:
Creation Timestamp: 2023-11-13T14:05:49Z
Generation: 1
Managed Fields:
API Version: cdi.kubevirt.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:cdi.kubevirt.io/storage.usePopulator:
Manager: cdi-controller
Operation: Update
Time: 2023-11-13T14:05:49Z
API Version: cdi.kubevirt.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:contentType:
f:pvc:
.:
f:accessModes:
f:resources:
.:
f:requests:
.:
f:storage:
f:source:
.:
f:http:
.:
f:url:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-11-13T14:05:49Z
API Version: cdi.kubevirt.io/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:claimName:
f:conditions:
f:phase:
f:progress:
f:restartCount:
Manager: cdi-controller
Operation: Update
Subresource: status
Time: 2023-11-13T14:08:58Z
Resource Version: 8351981
UID: 1c970368-e14b-4132-ba77-31b3a3ee70f6
Spec:
Content Type: archive
Pvc:
Access Modes:
ReadWriteOnce
Resources:
Requests:
Storage: 100Mi
Source:
Http:
URL: https://webhost/example.tar
Status:
Claim Name: import-archive-datavolume-rel
Conditions:
Last Heartbeat Time: 2023-11-13T14:05:49Z
Last Transition Time: 2023-11-13T14:05:49Z
Message: PVC import-archive-datavolume-rel Pending
Reason: Pending
Status: False
Type: Bound
Last Heartbeat Time: 2023-11-13T14:08:58Z
Last Transition Time: 2023-11-13T14:05:49Z
Status: False
Type: Ready
Last Heartbeat Time: 2023-11-13T14:08:58Z
Last Transition Time: 2023-11-13T14:08:58Z
Message: Unable to process data: Unable to transfer source data to target directory: unable to untar files from endpoint: exit status 2
Reason: Error
Status: False
Type: Running
Phase: ImportInProgress
Progress: N/A
Restart Count: 5
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pending 4m1s datavolume-import-controller PVC import-archive-datavolume-rel Pending
Normal ImportInProgress 3m24s datavolume-import-controller Import into import-archive-datavolume-rel in progress
Warning Error 52s (x5 over 3m24s) datavolume-import-controller Unable to process data: Unable to transfer source data to target directory: unable to untar files from endpoint: exit status 2
Additional context: Using Rook-Ceph as the storage provider, but can reproduce running importer container on local storage via docker. Seems to have worked ok in 1.52.0. Possibly related to permissions running as non-root user in container.
Environment:
- CDI version (use
kubectl get deployments cdi-deployment -o yaml): 1.57.0 - Kubernetes version (use
kubectl version): v1.26.8 - DV specification: Provided above
- Cloud provider or hardware configuration: AWS
- OS (e.g. from /etc/os-release): Ubuntu 22.04.1 LTS
- Kernel (e.g.
uname -a): Linux e8451be7-6ce0-4581-8d7b-4ecff846abde-jx84w-pool3-f873b2ad-cxwds 5.15.0-1017-aws #21-Ubuntu SMP Fri Aug 5 11:10:45 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux - Install tools: N/A
- Others: N/A
Hey, thanks for reporting this!
I think if you increase the CDI log verbosity with something like
kubectl set env deployment cdi-operator \
--namespace="${cdi_namespace}" \
--containers='cdi-operator' \
VERBOSITY="3"
We should get the actual stdout/stderr of the untar command
Lightly redacted output:
I1114 14:05:16.309950 1 importer.go:103] Starting importer
I1114 14:05:16.309997 1 importer.go:172] begin import process
I1114 14:05:16.310038 1 http-datasource.go:392] Attempting to HEAD "https://<url>/example.tar" via http client
I1114 14:05:16.598432 1 http-datasource.go:424] Content length: 2048
I1114 14:05:16.598446 1 http-datasource.go:327] Attempting to get object "https://<url>/example.tar" via http client
I1114 14:05:16.671028 1 data-processor.go:356] Calculating available size
I1114 14:05:16.671075 1 data-processor.go:368] Checking out file system volume size.
I1114 14:05:16.671103 1 data-processor.go:376] Request image size not empty.
I1114 14:05:16.671114 1 data-processor.go:381] Target size 96112640.
I1114 14:05:16.671149 1 format-readers.go:112] constructReaders: checking compression and archive formats
I1114 14:05:16.671163 1 format-readers.go:121] found header of type "tar"
I1114 14:05:16.671171 1 data-processor.go:255] New phase: TransferDataDir
I1114 14:05:16.671180 1 util.go:207] begin untar to /data...
I1114 14:05:16.671187 1 util.go:213] running untar cmd: [/usr/bin/tar --preserve-permissions --no-same-owner -xvC /data]
I1114 14:05:16.672780 1 util.go:220] STDOUT
./
./example
I1114 14:05:16.672787 1 util.go:221] STDERR
/usr/bin/tar: .: Cannot utime: Operation not permitted
/usr/bin/tar: .: Cannot change mode to rwxr-xr-x: Operation not permitted
/usr/bin/tar: Exiting with failure status due to previous errors
E1114 14:05:16.672793 1 util.go:222] exit status 2
E1114 14:05:16.672806 1 data-processor.go:251] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
E1114 14:05:16.672884 1 importer.go:181] exit status 2
unable to untar files from endpoint
kubevirt.io/containerized-data-importer/pkg/importer.(*HTTPDataSource).Transfer
pkg/importer/http-datasource.go:169
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:191
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
Unable to transfer source data to target directory
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func3
pkg/importer/data-processor.go:193
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
pkg/importer/data-processor.go:248
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
pkg/importer/data-processor.go:157
main.handleImport
cmd/cdi-importer/importer.go:178
main.main
cmd/cdi-importer/importer.go:144
runtime.main
GOROOT/src/runtime/proc.go:250
runtime.goexit
GOROOT/src/runtime/asm_amd64.s:1594
Note that this is on nodes with the device_ownership_from_security_context set to true at the containerd level.
I see. Maybe as non-root it would make sense for us to use these
-m, --touch
Don't extract file modified time.
--no-overwrite-dir
Preserve metadata of existing directories.
/assign akalenyu
Is this still an issue for you?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
I've encountered the same issue.
Content of TAR that causes the error:
$ tar -tv --numeric-owner -f archive.tar
drwxr-xr-x 2009/2000 0 2024-04-05 05:33 ./
drwxr-xr-x 2009/2000 0 2024-04-05 05:33 ./blah/
-rw-r--r-- 2009/2000 12 2024-04-05 05:33 ./blah/README
drwxr-xr-x 2009/2000 0 2024-04-05 05:33 ./foo/
Content of the TAR that doesn't cause an error:
$ tar -tv --numeric-owner -f archive2.tar
drwxr-xr-x 2009/2000 0 2024-04-05 05:33 blah/
-rw-r--r-- 2009/2000 12 2024-04-05 05:33 blah/README
drwxr-xr-x 2009/2000 0 2024-04-05 05:33 foo/
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale issue definitely still around
Issue isn't limited to archives containing links.
The parameters used by CDI when calling tar won't work with all PVC/StorageClass and Pod security context due to ownership.
It looks like the current importer Pod is tailored for importing disk images meant to be consumed by libvirt/qemu. But maybe that's a limiting factor when just trying to import an archive with random files to be mounted with VirtIO disk and not as a VM disk.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/lifecycle frozen