goofys icon indicating copy to clipboard operation
goofys copied to clipboard

Data loss copying from compressed ZFS volume into goofys

Open jtwang83 opened this issue 3 years ago • 2 comments

I have a file that is mostly NULs. Copying it into goofys from a lz4 compressed ZFS dataset results in a truncated file in S3. Copying the same file from ext2 or ext4 does not have this problem. It does not seem to be entirely based on the compressibility of the file in ZFS; I have many similar files and only a few examples result in this data loss.

I am using goofys 0.24.0 but the same problem occurs in goofys 0.23.1.

The file is attached, the correct length is 315616.

Here is the bad copy, from ZFS.

2022/08/07 19:58:21.382950 fuse.DEBUG Create 5 archive/2021/20210407/rdx-trading [tse.positions]
2022/08/07 19:58:21.383107 fuse.DEBUG <-- CreateFile 7 archive/2021/20210407/rdx-trading/tse.positions []
2022/08/07 19:58:21.383132 fuse.DEBUG Op 0x0000000e        connection.go:491] -> OK (inode 7)
2022/08/07 19:58:21.383413 fuse.DEBUG Op 0x0000000f        connection.go:408] <- GetXattr (inode 7, name "security.capability", name security.capability)
2022/08/07 19:58:21.383475 fuse.DEBUG GetXattr 7 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 19:58:21.383510 fuse.DEBUG Op 0x0000000f        connection.go:493] -> Error: "no data available"
2022/08/07 19:58:21.383624 fuse.DEBUG Op 0x00000010        connection.go:408] <- WriteFile (inode 7, handle 1, offset 0, 65536 bytes)
2022/08/07 19:58:21.383677 fuse.DEBUG WriteFile 7 archive/2021/20210407/rdx-trading/tse.positions [0 65536]
2022/08/07 19:58:21.383897 main.DEBUG the memory cgroup path for the current process is /sys/fs/cgroup/memory
2022/08/07 19:58:21.383928 main.DEBUG amount of available memory: 415
2022/08/07 19:58:21.383967 main.DEBUG amount of allocated memory: 68 3
2022/08/07 19:58:21.383974 main.DEBUG using up to 48 5MB buffers, now is 0
2022/08/07 19:58:21.383982 buffer.DEBUG requesting 5242880
2022/08/07 19:58:21.384529 fuse.DEBUG Op 0x00000010        connection.go:491] -> OK ()
2022/08/07 19:58:21.385185 fuse.DEBUG Op 0x00000011        connection.go:408] <- GetXattr (inode 7, name "security.capability", name security.capability)
2022/08/07 19:58:21.385399 fuse.DEBUG GetXattr 7 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 19:58:21.385430 fuse.DEBUG Op 0x00000011        connection.go:493] -> Error: "no data available"
2022/08/07 19:58:21.385611 fuse.DEBUG Op 0x00000012        connection.go:408] <- WriteFile (inode 7, handle 1, offset 65536, 65536 bytes)
2022/08/07 19:58:21.385658 fuse.DEBUG WriteFile 7 archive/2021/20210407/rdx-trading/tse.positions [65536 65536]
2022/08/07 19:58:21.385722 fuse.DEBUG Op 0x00000012        connection.go:491] -> OK ()
2022/08/07 19:58:21.386263 fuse.DEBUG Op 0x00000013        connection.go:408] <- GetXattr (inode 7, name "security.capability", name security.capability)
2022/08/07 19:58:21.386318 fuse.DEBUG GetXattr 7 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 19:58:21.386347 fuse.DEBUG Op 0x00000013        connection.go:493] -> Error: "no data available"
2022/08/07 19:58:21.386435 fuse.DEBUG Op 0x00000014        connection.go:408] <- SetInodeAttributes (inode 7, size 315616)
2022/08/07 19:58:21.386470 fuse.DEBUG GetAttributes 7 archive/2021/20210407/rdx-trading/tse.positions []
2022/08/07 19:58:21.386492 fuse.DEBUG Op 0x00000014        connection.go:491] -> OK ()
2022/08/07 19:58:21.386573 fuse.DEBUG Op 0x00000015        connection.go:408] <- RemoveXattr (inode 7, name "security.ima", name security.ima)
2022/08/07 19:58:21.386627 fuse.DEBUG RemoveXattr 7 archive/2021/20210407/rdx-trading/tse.positions [security.ima]
2022/08/07 19:58:21.386643 fuse.DEBUG Op 0x00000015        connection.go:493] -> Error: "operation not permitted"
2022/08/07 19:58:21.386655 fuse.ERROR *fuseops.RemoveXattrOp error: operation not permitted
2022/08/07 19:58:21.386738 fuse.DEBUG Op 0x00000016        connection.go:408] <- FlushFile (inode 7)
2022/08/07 19:58:21.386859 fuse.DEBUG FlushFile 7 archive/2021/20210407/rdx-trading/tse.positions []
2022/08/07 19:58:21.388169 s3.DEBUG DEBUG: Request s3/PutObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /ergosphere-refdata/archive/2021/20210407/rdx-trading/tse.positions HTTP/1.1
Host: s3.us-east-2.amazonaws.com
User-Agent: aws-sdk-go/1.17.13 (go1.13.1; linux; amd64)
Content-Length: 131072
Authorization: AWS4-HMAC-SHA256 Credential=ASIATUDHQYDCE3PY46HT/20220807/us-east-2/s3/aws4_request, SignedHeaders=content-length;content-md5;host;x-amz-content-sha256;x-amz-date;x-amz-security-token;x-amz-server-side-encryption;x-amz-storage-class, Signature=390c79b1a2a1eddbdb47f7d766039b632f375b574df5441658bc0f2f0e4221df
Content-Md5: 1wvd8J8n0Sw4Aob+XP2SfQ==
X-Amz-Content-Sha256: 6623386bcfad79084ca122584a654508df103ea8926a0cf8b88eb464e9b6eda3
X-Amz-Date: 20220807T195821Z
X-Amz-Security-Token: redacted
X-Amz-Server-Side-Encryption: AES256
X-Amz-Storage-Class: STANDARD
Accept-Encoding: gzip


-----------------------------------------------------
2022/08/07 19:58:21.444390 s3.DEBUG DEBUG: Response s3/PutObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Content-Length: 0
Date: Sun, 07 Aug 2022 19:58:22 GMT
Etag: "d70bddf09f27d12c380286fe5cfd927d"
Server: AmazonS3
X-Amz-Id-2: i/sykiKuluXB/6b45zER4rcjhbx06aOruOKUOf4Razx6C7NED/fVaAzVtPG0NIDoMQbn5g29wWM=
X-Amz-Request-Id: E3298RXF1A6R67DF
X-Amz-Server-Side-Encryption: AES256


-----------------------------------------------------
2022/08/07 19:58:21.444440 buffer.DEBUG returning 131072
2022/08/07 19:58:21.444461 fuse.DEBUG <-- FlushFile 7 archive/2021/20210407/rdx-trading/tse.positions [<nil> 1 7]
2022/08/07 19:58:21.444510 fuse.DEBUG Op 0x00000016        connection.go:491] -> OK ()
2022/08/07 19:58:21.444659 fuse.DEBUG Op 0x00000017        connection.go:408] <- ReleaseFileHandle
2022/08/07 19:58:21.444729 fuse.DEBUG ReleaseFileHandle archive/2021/20210407/rdx-trading/tse.positions 1 7
2022/08/07 19:58:21.444748 fuse.DEBUG Op 0x00000017        connection.go:491] -> OK ()

And here is the copy from ext2:


2022/08/07 20:14:33.839728 fuse.DEBUG Op 0x00000048        connection.go:408] <- CreateFile (parent 5, name "tse.positions")
2022/08/07 20:14:33.839785 fuse.DEBUG Create 5 archive/2021/20210407/rdx-trading [tse.positions]
2022/08/07 20:14:33.839924 fuse.DEBUG <-- CreateFile 52 archive/2021/20210407/rdx-trading/tse.positions []
2022/08/07 20:14:33.839947 fuse.DEBUG Op 0x00000048        connection.go:491] -> OK (inode 52)
2022/08/07 20:14:33.840114 fuse.DEBUG Op 0x00000049        connection.go:408] <- GetXattr (inode 52, name "security.capability", name security.capability)
2022/08/07 20:14:33.840175 fuse.DEBUG GetXattr 52 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 20:14:33.840195 fuse.DEBUG Op 0x00000049        connection.go:493] -> Error: "no data available"
2022/08/07 20:14:33.840402 fuse.DEBUG Op 0x0000004a        connection.go:408] <- WriteFile (inode 52, handle 6, offset 0, 65536 bytes)
2022/08/07 20:14:33.840444 fuse.DEBUG WriteFile 52 archive/2021/20210407/rdx-trading/tse.positions [0 65536]
2022/08/07 20:14:33.840456 buffer.DEBUG requesting 5242880
2022/08/07 20:14:33.843428 fuse.DEBUG Op 0x0000004a        connection.go:491] -> OK ()
2022/08/07 20:14:33.844236 fuse.DEBUG Op 0x0000004b        connection.go:408] <- GetXattr (inode 52, name "security.capability", name security.capability)
2022/08/07 20:14:33.844501 fuse.DEBUG GetXattr 52 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 20:14:33.844535 fuse.DEBUG Op 0x0000004b        connection.go:493] -> Error: "no data available"
2022/08/07 20:14:33.844669 fuse.DEBUG Op 0x0000004c        connection.go:408] <- WriteFile (inode 52, handle 6, offset 65536, 65536 bytes)
2022/08/07 20:14:33.844861 fuse.DEBUG WriteFile 52 archive/2021/20210407/rdx-trading/tse.positions [65536 65536]
2022/08/07 20:14:33.844892 fuse.DEBUG Op 0x0000004c        connection.go:491] -> OK ()
2022/08/07 20:14:33.844978 fuse.DEBUG Op 0x0000004d        connection.go:408] <- GetXattr (inode 52, name "security.capability", name security.capability)
2022/08/07 20:14:33.845023 fuse.DEBUG GetXattr 52 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 20:14:33.845048 fuse.DEBUG Op 0x0000004d        connection.go:493] -> Error: "no data available"
2022/08/07 20:14:33.845196 fuse.DEBUG Op 0x0000004e        connection.go:408] <- WriteFile (inode 52, handle 6, offset 131072, 65536 bytes)
2022/08/07 20:14:33.845241 fuse.DEBUG WriteFile 52 archive/2021/20210407/rdx-trading/tse.positions [131072 65536]
2022/08/07 20:14:33.845274 fuse.DEBUG Op 0x0000004e        connection.go:491] -> OK ()
2022/08/07 20:14:33.845351 fuse.DEBUG Op 0x0000004f        connection.go:408] <- GetXattr (inode 52, name "security.capability", name security.capability)
2022/08/07 20:14:33.845530 fuse.DEBUG GetXattr 52 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 20:14:33.845549 fuse.DEBUG Op 0x0000004f        connection.go:493] -> Error: "no data available"
2022/08/07 20:14:33.845670 fuse.DEBUG Op 0x00000050        connection.go:408] <- WriteFile (inode 52, handle 6, offset 196608, 65536 bytes)
2022/08/07 20:14:33.845709 fuse.DEBUG WriteFile 52 archive/2021/20210407/rdx-trading/tse.positions [196608 65536]
2022/08/07 20:14:33.845742 fuse.DEBUG Op 0x00000050        connection.go:491] -> OK ()
2022/08/07 20:14:33.845809 fuse.DEBUG Op 0x00000051        connection.go:408] <- GetXattr (inode 52, name "security.capability", name security.capability)
2022/08/07 20:14:33.845895 fuse.DEBUG GetXattr 52 archive/2021/20210407/rdx-trading/tse.positions [security.capability]
2022/08/07 20:14:33.845916 fuse.DEBUG Op 0x00000051        connection.go:493] -> Error: "no data available"
2022/08/07 20:14:33.845997 fuse.DEBUG Op 0x00000052        connection.go:408] <- WriteFile (inode 52, handle 6, offset 262144, 53472 bytes)
2022/08/07 20:14:33.846205 fuse.DEBUG WriteFile 52 archive/2021/20210407/rdx-trading/tse.positions [262144 53472]
2022/08/07 20:14:33.846258 fuse.DEBUG Op 0x00000052        connection.go:491] -> OK ()
2022/08/07 20:14:33.846309 fuse.DEBUG Op 0x00000053        connection.go:408] <- FlushFile (inode 52)
2022/08/07 20:14:33.846470 fuse.DEBUG FlushFile 52 archive/2021/20210407/rdx-trading/tse.positions []
2022/08/07 20:14:33.849182 s3.DEBUG DEBUG: Request s3/PutObject Details:
---[ REQUEST POST-SIGN ]----------------------------- 
PUT /ergosphere-refdata/archive/2021/20210407/rdx-trading/tse.positions HTTP/1.1
Host: s3.us-east-2.amazonaws.com
User-Agent: aws-sdk-go/1.17.13 (go1.13.1; linux; amd64)
Content-Length: 315616
Authorization: AWS4-HMAC-SHA256 Credential=ASIATUDHQYDCE3PY46HT/20220807/us-east-2/s3/aws4_request, SignedHeaders=content-length;content-md5;host;x-amz-content-sha256;x-amz-date;x-amz-security-token;x-amz-server-side-encryption;x-amz-storage-class, Signature=bd752ec93a466bba97e1e2c0c9a514d7bfee794e5fd1fcc4b7dd172eaff49d3c
Content-Md5: Nfrzrl0S7lNz6W5MORoBIw==
X-Amz-Content-Sha256: 837c58a3441f74cb384f615c823ba77f98dbde88aaecc8c6ffb367b9296891fa
X-Amz-Date: 20220807T201433Z
X-Amz-Security-Token: (redacted)
X-Amz-Server-Side-Encryption: AES256
X-Amz-Storage-Class: STANDARD
Accept-Encoding: gzip


-----------------------------------------------------
2022/08/07 20:14:33.934644 s3.DEBUG DEBUG: Response s3/PutObject Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Content-Length: 0
Date: Sun, 07 Aug 2022 20:14:34 GMT
Etag: "35faf3ae5d12ee5373e96e4c391a0123"
Server: AmazonS3
X-Amz-Id-2: 2JBxnn9SPnn5ecKdXy7BRhPs65lbANJNy+gD0oe36iT8SEixiCs2gcpVplqJx2eGsFAd5OE184A=
X-Amz-Request-Id: 98BHB17DQ0953D3E
X-Amz-Server-Side-Encryption: AES256


-----------------------------------------------------
2022/08/07 20:14:33.934691 buffer.DEBUG returning 315616
2022/08/07 20:14:33.934713 fuse.DEBUG <-- FlushFile 52 archive/2021/20210407/rdx-trading/tse.positions [<nil> 6 52]
2022/08/07 20:14:33.934749 fuse.DEBUG Op 0x00000053        connection.go:491] -> OK ()
2022/08/07 20:14:33.934904 fuse.DEBUG Op 0x00000054        connection.go:408] <- ReleaseFileHandle
2022/08/07 20:14:33.934987 fuse.DEBUG ReleaseFileHandle archive/2021/20210407/rdx-trading/tse.positions 6 52
2022/08/07 20:14:33.935009 fuse.DEBUG Op 0x00000054        connection.go:491] -> OK ()

Output of stat of the file in ZFS:

$ stat tse.positions
  File: ‘tse.positions’
  Size: 315616          Blocks: 53         IO Block: 131072 regular file
Device: 2ah/42d Inode: 213930      Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  500/  jtwang)   Gid: (  500/  jtwang)
Access: 2017-03-08 19:42:13.506164300 +0000
Modify: 2022-08-06 20:36:14.293264487 +0000
Change: 2022-08-06 20:36:14.293264487 +0000
 Birth: -

Output of stat of the file in ext2:

[jtwang@prod-aws ops]$ stat /tmp/tse.positions 
  File: ‘/tmp/tse.positions’
  Size: 315616          Blocks: 624        IO Block: 4096   regular file
Device: ca01h/51713d    Inode: 16854880    Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  500/  jtwang)   Gid: (  500/  jtwang)
Access: 2022-08-07 20:14:33.838827389 +0000
Modify: 2022-08-07 20:14:20.861190224 +0000
Change: 2022-08-07 20:14:20.861190224 +0000
 Birth: -

tse.positions.gz

jtwang83 avatar Aug 07 '22 20:08 jtwang83

how are you doing the copy?

kahing avatar Nov 03 '22 23:11 kahing

When I originally encountered the problem, I think I was using rsync -a. But when debugging with the single file example, I believe I had the same issue with cp.

jtwang83 avatar Nov 03 '22 23:11 jtwang83