alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

File in alluxio cache is inconsistent with ufs file under large file number

Open flyhighzy opened this issue 2 years ago • 1 comments

Alluxio Version: 2.7

Describe the bug We found file in alluxio cache is inconsistent with remote s3-compatible file system, but only some of them are inconsistent, most files working fine. File number: about 61 million

This is file in alluxio cache's md5: WeChatWorkScreenshot_d2881171-e2e0-442b-ae4d-ddcdb2ecb417

This is file downloaded from origin S3 storage: 企业微信截图_47a4f6f4-c0eb-4531-85ca-9c42aad32470

It certainly have different md5, but checkConsistency command returns true: image

and the file in alluxio and remote have the same file size, so strange!

To Reproduce not very sure, with large number of files, some file can have different content

Expected behavior expect all files in alluxio should be the same with ufs.

Urgency may cause our business users's training job to be failed, and introduce some invalid training data

Are you planning to fix it hope community give a hand

flyhighzy avatar Aug 05 '22 13:08 flyhighzy

2.8.1 + fuse 2 + direct_io

LuQQiu avatar Aug 09 '22 02:08 LuQQiu