k8s-csi-s3 icon indicating copy to clipboard operation
k8s-csi-s3 copied to clipboard

Caching support

Open nuwang opened this issue 2 years ago • 1 comments

Hi,

I'm trying to enable caching in the CSI driver. I've passed extra mountOptions as follows:

mountOptions: "--memory-limit 4000 --dir-mode 0777 --file-mode 0666 --cache /tmp --debug --debug_fuse --stat-cache-ttl 9m0s --cache-to-disk-hits 1"

and they are being passed in correctly according to the logs:

I0622 15:05:55.915639 1 mounter.go:65] Mounting fuse with command: geesefs and args: [--endpoint https://s3.ap-southeast-2.amazonaws.com -o allow_other --log-file /dev/stderr --memory-limit 4000 --dir-mode 0777 --file-mode 0666 --cache /tmp --debug --debug_fuse --stat-cache-ttl 9m0s --cache-to-disk-hits 1 biorefdata:galaxy/v1/data.galaxyproject.org /var/lib/kubelet/pods/9d508976-732c-4a3f-8bf6-89bd097e831b/volumes/kubernetes.io~csi/pvc-6a8c3758-8784-4fcc-9311-4305b3cce8e4/mount]

However, the /tmp directory remains empty. Am I doing something wrong?

Also, with multiple pods mounting the same PVC, would the cache work correctly? I can see that there are multiple geesefs processes running, all pointing to the same cache path.

Finally, we want to use this with long-living, entirely read-only data (these are reference genomes and associated read-only data). This is why I set the cache-to-disk-hits to 1, assuming that caused the file to be cached on the very first read. Could you please recommend the best settings for very aggressive caching? I've noticed a lot of S3 calls being made for the same path even though that path for instance, has already been checked recently.

nuwang avatar Jun 22 '22 15:06 nuwang