fluid icon indicating copy to clipboard operation
fluid copied to clipboard

[BUG] dataset stuck in caculating state

Open miaojianwei opened this issue 3 years ago • 2 comments

What is your environment(Kubernetes version, Fluid version, etc.) kubernetes: v1.19.8 fluid: v0.6.0-48de610

Describe the bug When the underlying file system is a distributed file system, the performance is poor, and there are a large number of small files in the directory, syncing metadata will be stuck in SyncLocalDir function because it called du -h, resulting in the status of the dataset being caculating and unable to be updated.

Details information see Debug Info

What you expect to happen: Is du -h necessary and is there any better implementation

How to reproduce it Need a distributed file system and store a lot of small files

Additional Information

Debug Info root@xxx:~# kubectl -n xxx get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE autodata [Calculating] 0.00B 7.00TiB Bound 2d4h

root@xxx:~# kubectl -n xxx describe dataset autodata Name: autodata ... Spec: Mounts: Mount Point: local:///mnt/xxx Name: autodata ... Status: Cache States: Cache Capacity: 7.00TiB Cache Hit Ratio: 0.0% Cache Throughput Ratio: 0.0% Cached: 0.00B Cached Percentage: Local Hit Ratio: 0.0% Local Throughput Ratio: 0.0% Remote Hit Ratio: 0.0% Remote Throughput Ratio: 0.0% Conditions: Last Transition Time: 2022-02-22T08:06:29Z Last Update Time: 2022-02-24T12:33:13Z Message: The ddc runtime is ready. Reason: DatasetReady Status: True Type: Ready File Num: [Calculating] Hcfs: Endpoint: alluxio://autodata-master-0.xxx:20045 Underlayer File System Version: 3.3.0 Mounts: Mount Point: local:///mnt/xxx Name: autodata Phase: Bound Runtimes: Category: Accelerate Name: autodata Namespace: xxx Type: alluxio Ufs Total: [Calculating] Events:

root@xxx:~# ps -aux | grep autodata root 40042 0.0 0.0 40632 37592 ? Ds Feb22 0:14 du -sh /underFSStorage/autodata root 44158 0.0 0.0 40632 37592 ? Ds Feb22 0:26 du -sh /underFSStorage/autodata root 99580 0.0 0.0 8700 5076 ? DN Feb22 1:24 du -x -s -B 1 /var/lib/kubelet/pods/8b69ff2c-b1af-4727-988e-b5d00632c2b9/volumes/kubernetes.io~empty-dir/sharefs root 99581 0.0 0.0 8624 5188 ? DN Feb22 1:24 du -x -s -B 1 /var/lib/kubelet/pods/5e29c6bd-6164-40b2-a6e1-9101a76437c6/volumes/kubernetes.io~empty-dir/sharefs root 145217 0.0 0.0 4464 620 ? Ds Feb22 0:00 du -sh /underFSStorage/autodata root 215005 0.0 0.0 4464 672 ? Ds 17:51 0:00 du -sh /underFSStorage/autodata root 550507 0.0 0.0 4464 700 ? Ds 19:10 0:00 du -sh /underFSStorage/autodata root 833101 0.0 0.0 4464 704 ? D+ 20:19 0:00 du -sh /underFSStorage/autodata root 843017 0.0 0.0 6432 2616 pts/0 S+ 20:22 0:00 grep --color=auto du root 2393559 0.0 0.0 8676 5316 ? SN Feb23 1:10 du -x -s -B 1 /var/lib/kubelet/pods/1617dc17-07fa-4cfb-8a04-446cdf46ed05/volumes/kubernetes.io~empty-dir/sharefs root 2508950 0.1 0.0 10380 6836 ? SN Feb23 1:37 du -x -s -B 1 /var/lib/kubelet/pods/6000db76-b30c-4e6a-b0be-9b319991e070/volumes/kubernetes.io~empty-dir/sharefs

root@xxx:~# k -n fluid-system logs -f alluxioruntime-controller-84bf5bb796-sf272 | grep -C 3 "Syncing local dir" 2022-02-24T19:10:17.802+0800 INFO kubeclient kubeclient/volume.go:97 The persistentVolume exist {"name": "dataset-videol-frame", "annotaitons": {"CreatedBy":"fluid"}} 2022-02-24T19:10:17.802+0800 INFO kubeclient kubeclient/volume.go:97 The persistentVolume exist {"name": "autodata", "annotaitons": {"CreatedBy":"fluid"}} 2022-02-24T19:10:17.843+0800 INFO alluxioctl.AlluxioRuntime alluxio/metadata.go:255 Metadata Sync starts {"alluxioruntime": "xxx/autodata", "dataset namespace": "xxx", "dataset name": "autodata"} 2022-02-24T19:10:17.843+0800 INFO alluxioctl.AlluxioRuntime alluxio/metadata.go:264 Syncing local dir, path: /underFSStorage/autodata {"alluxioruntime": "xxx/autodata"} 2022-02-24T19:10:17.843+0800 INFO kubeclient kubeclient/exec.go:81 kubeconfig file is placed. {"config": ""} W0224 19:10:17.843205 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2022-02-24T19:10:19.522+0800 INFO alluxioctl.AlluxioRuntime operations/base.go:468 execute in time {"alluxioruntime": "xxx/autodata", "command": ["alluxio", "fsadmin", "report", "summary"]}

miaojianwei avatar Feb 24 '22 12:02 miaojianwei

In some engine like alluxio and goosefs, some operation like loadMetadatafs count will load all metadata of one mount point, if the num of the path is very large , the underlying storage system will under great pressure. I think we need one way to enable/disable automatic metadata sync.

@cheyang @TrafalgarZZZ

xieydd avatar Mar 01 '22 06:03 xieydd

Is there a solution to this problem, and I am now experiencing a similar situation.

liutingqin avatar Apr 06 '22 08:04 liutingqin