[BUG] Fluid native mount points won't be synced in DataLoad
What is your environment(Kubernetes version, Fluid version, etc.)
Describe the bug
Users can define the property loadMetadata: true to ask DataLoad load metadata before it loads the actual data. For fluid-native mountpoints like pvc:// and local://, a POSIX-compliant metadata sync (e.g. run du -sh) is additionally needed before the DataLoad loads their metadata in the ddc engine.
However, the current implementation of DataLoad will not sync metadata of these fluid-native mountpoints because the Pod of the DataLoad job mistakenly mounts the wrong folder. You can see the chart here. The .path used in the chart is actually the absolute path in ddc engine, so in most case the real local folder that needs to sync is not mounted.
What you expect to happen: To ensure the ddc engine can see files and folders under some fluid-native mountpoints, a POSIX-compliant sync operation is needed.
How to reproduce it Here is a quick example:
# dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: test
spec:
mounts:
- mountPoint: local:///mnt/test1
name: test1
- mountPoint: local:///mnt/test2
name: test2
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
name: test
spec:
replicas: 1
tieredstore:
levels:
- mediumtype: SSD
path: /var/lib/docker/alluxio
quota: 2Gi
high: "0.95"
low: "0.7"
# dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: ya-dataload
spec:
dataset:
name: test
namespace: default
loadMetadata: true
target:
- path: /test1
replicas: 1
- path: /test2
replicas: 1
Additional Information
Any progress?