fluid
fluid copied to clipboard
[BUG] dataset stuck in caculating state
What is your environment(Kubernetes version, Fluid version, etc.) kubernetes: v1.19.8 fluid: v0.6.0-48de610
Describe the bug
When the underlying file system is a distributed file system, the performance is poor, and there are a large number of small files in the directory, syncing metadata will be stuck in SyncLocalDir
function because it called du -h
, resulting in the status of the dataset being caculating and unable to be updated.
Details information see Debug Info
What you expect to happen:
Is du -h
necessary and is there any better implementation
How to reproduce it Need a distributed file system and store a lot of small files
Additional Information
Debug Info root@xxx:~# kubectl -n xxx get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE autodata [Calculating] 0.00B 7.00TiB Bound 2d4h
root@xxx:~# kubectl -n xxx describe dataset autodata
Name: autodata
...
Spec:
Mounts:
Mount Point: local:///mnt/xxx
Name: autodata
...
Status:
Cache States:
Cache Capacity: 7.00TiB
Cache Hit Ratio: 0.0%
Cache Throughput Ratio: 0.0%
Cached: 0.00B
Cached Percentage:
Local Hit Ratio: 0.0%
Local Throughput Ratio: 0.0%
Remote Hit Ratio: 0.0%
Remote Throughput Ratio: 0.0%
Conditions:
Last Transition Time: 2022-02-22T08:06:29Z
Last Update Time: 2022-02-24T12:33:13Z
Message: The ddc runtime is ready.
Reason: DatasetReady
Status: True
Type: Ready
File Num: [Calculating]
Hcfs:
Endpoint: alluxio://autodata-master-0.xxx:20045
Underlayer File System Version: 3.3.0
Mounts:
Mount Point: local:///mnt/xxx
Name: autodata
Phase: Bound
Runtimes:
Category: Accelerate
Name: autodata
Namespace: xxx
Type: alluxio
Ufs Total: [Calculating]
Events:
root@xxx:~# ps -aux | grep autodata root 40042 0.0 0.0 40632 37592 ? Ds Feb22 0:14 du -sh /underFSStorage/autodata root 44158 0.0 0.0 40632 37592 ? Ds Feb22 0:26 du -sh /underFSStorage/autodata root 99580 0.0 0.0 8700 5076 ? DN Feb22 1:24 du -x -s -B 1 /var/lib/kubelet/pods/8b69ff2c-b1af-4727-988e-b5d00632c2b9/volumes/kubernetes.io~empty-dir/sharefs root 99581 0.0 0.0 8624 5188 ? DN Feb22 1:24 du -x -s -B 1 /var/lib/kubelet/pods/5e29c6bd-6164-40b2-a6e1-9101a76437c6/volumes/kubernetes.io~empty-dir/sharefs root 145217 0.0 0.0 4464 620 ? Ds Feb22 0:00 du -sh /underFSStorage/autodata root 215005 0.0 0.0 4464 672 ? Ds 17:51 0:00 du -sh /underFSStorage/autodata root 550507 0.0 0.0 4464 700 ? Ds 19:10 0:00 du -sh /underFSStorage/autodata root 833101 0.0 0.0 4464 704 ? D+ 20:19 0:00 du -sh /underFSStorage/autodata root 843017 0.0 0.0 6432 2616 pts/0 S+ 20:22 0:00 grep --color=auto du root 2393559 0.0 0.0 8676 5316 ? SN Feb23 1:10 du -x -s -B 1 /var/lib/kubelet/pods/1617dc17-07fa-4cfb-8a04-446cdf46ed05/volumes/kubernetes.io~empty-dir/sharefs root 2508950 0.1 0.0 10380 6836 ? SN Feb23 1:37 du -x -s -B 1 /var/lib/kubelet/pods/6000db76-b30c-4e6a-b0be-9b319991e070/volumes/kubernetes.io~empty-dir/sharefs
root@xxx:~# k -n fluid-system logs -f alluxioruntime-controller-84bf5bb796-sf272 | grep -C 3 "Syncing local dir" 2022-02-24T19:10:17.802+0800 INFO kubeclient kubeclient/volume.go:97 The persistentVolume exist {"name": "dataset-videol-frame", "annotaitons": {"CreatedBy":"fluid"}} 2022-02-24T19:10:17.802+0800 INFO kubeclient kubeclient/volume.go:97 The persistentVolume exist {"name": "autodata", "annotaitons": {"CreatedBy":"fluid"}} 2022-02-24T19:10:17.843+0800 INFO alluxioctl.AlluxioRuntime alluxio/metadata.go:255 Metadata Sync starts {"alluxioruntime": "xxx/autodata", "dataset namespace": "xxx", "dataset name": "autodata"} 2022-02-24T19:10:17.843+0800 INFO alluxioctl.AlluxioRuntime alluxio/metadata.go:264 Syncing local dir, path: /underFSStorage/autodata {"alluxioruntime": "xxx/autodata"} 2022-02-24T19:10:17.843+0800 INFO kubeclient kubeclient/exec.go:81 kubeconfig file is placed. {"config": ""} W0224 19:10:17.843205 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2022-02-24T19:10:19.522+0800 INFO alluxioctl.AlluxioRuntime operations/base.go:468 execute in time {"alluxioruntime": "xxx/autodata", "command": ["alluxio", "fsadmin", "report", "summary"]}
In some engine like alluxio and goosefs, some operation like loadMetadata
、fs count
will load all metadata of one mount point, if the num of the path is very large , the underlying storage system will under great pressure.
I think we need one way to enable/disable automatic metadata sync.
@cheyang @TrafalgarZZZ
Is there a solution to this problem, and I am now experiencing a similar situation.