k8s-csi-s3 icon indicating copy to clipboard operation
k8s-csi-s3 copied to clipboard

fuse.ERROR *fuseops.ReadFileOp error: cannot allocate memory

Open 0xHigos opened this issue 1 year ago • 1 comments

Hi, I use this project to mount AIGC models, and the model size is 10Gi. but csi-pod-** got errors below: "main.ERROR Unable to allocate 20971520 bytes, used 30809278 bytes, limit is 53141504 bytes 2023/12/20 16:42:47.465186 main.ERROR Error reading 161021952 +20971520 of pvc-02de6383-c3d4-46e8-89b4-1add110149c1/model-clvtifjqs63c0ohc5jo0/release-clvtig3qs63c0ohc5jog/atom/1/local_model/base_model/pytorch_model-00001-of-00002.bin: cannot allocate memory 2023/12/20 16:42:48.163435 main.ERROR Unable to allocate 20971520 bytes, used 26221758 bytes, limit is 53141504 bytes 2023/12/20 16:42:48.163533 main.ERROR Error reading 181993472 +20971520 of pvc-02de6383-c3d4-46e8-89b4-1add110149c1/model-clvtifjqs63c0ohc5jo0/release-clvtig3qs63c0ohc5jog/atom/1/local_model/base_model/pytorch_model-00001-of-00002.bin: cannot allocate memory 2023/12/20 16:42:48.864207 main.ERROR Unable to allocate 28573696 bytes, used 18619582 bytes, limit is 53141504 bytes 2023/12/20 16:42:48.864320 main.ERROR Error reading 202964992 +28573696 of pvc-02de6383-c3d4-46e8-89b4-1add110149c1/model-clvtifjqs63c0ohc5jo0/release-clvtig3qs63c0ohc5jog/atom/1/local_model/base_model/pytorch_model-00001-of-00002.bin: cannot allocate memory 2023/12/20 16:42:49.363836 main.ERROR Unable to allocate 20971520 bytes, used 26221758 bytes, limit is 53141504 bytes 2023/12/20 16:42:49.363937 main.ERROR Error reading 161021952 +20971520 of pvc-02de6383-c3d4-46e8-89b4-1add110149c1/model-clvtifjqs63c0ohc5jo0/release-clvtig3qs63c0ohc5jog/atom/1/local_model/base_model/pytorch_model-00001-of-00002.bin: cannot allocate memory"

I chose helm to install k8s-csi-s3, the k8s-csi-s3 version is 0.35.4. and this is my value.yaml

---
images:
  # Source: quay.io/k8scsi/csi-attacher:v3.0.1
  attacher: cr.yandex/crp9ftr22d26age3hulg/yandex-cloud/csi-s3/csi-attacher:v3.0.1
  # Source: quay.io/k8scsi/csi-node-driver-registrar:v1.2.0
  registrar: cr.yandex/crp9ftr22d26age3hulg/yandex-cloud/csi-s3/csi-node-driver-registrar:v1.2.0
  # Source: quay.io/k8scsi/csi-provisioner:v2.1.0
  provisioner: cr.yandex/crp9ftr22d26age3hulg/yandex-cloud/csi-s3/csi-provisioner:v2.1.0
  # Main image
  csi: cr.yandex/crp9ftr22d26age3hulg/yandex-cloud/csi-s3/csi-s3-driver:0.35.4

storageClass:
  # Specifies whether the storage class should be created
  create: true
  # Name
  name: csi-s3
  # Use a single bucket for all dynamically provisioned persistent volumes
  singleBucket: "model-warehouse-test"
  # GeeseFS mount options
  mounter: s3fs
  # Volume reclaim policy
  reclaimPolicy: Retain
  # Annotations for the storage class
  # Example:
  # annotations:
  #   storageclass.kubernetes.io/is-default-class: "true"
  annotations: {}

secret:
  # Specifies whether the secret should be created
  create: true
  # Name of the secret
  name: csi-s3-secret
  # S3 Access Key
  accessKey: "root"
  # S3 Secret Key
  secretKey: "ttest123"
  # Endpoint
  endpoint: http://test123.ttt.svc:39876

tolerations:
  all: false
  node: []
  controller: []

My pvc yaml is

# Dynamically provisioned PVC:
# A bucket or path inside bucket will be created automatically
# for the PV and removed when the PV will be removed
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-s3-pvc
  namespace: ttt
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 150Gi
  storageClassName: csi-s3

It did not work when I tested s3fs and geese as my mounter. Could somebody who has similar issues guide me on how to solve it? very thanks :)

0xHigos avatar Dec 20 '23 17:12 0xHigos

Hi, check out this instruction https://github.com/yandex-cloud/geesefs/#memory-limit

vitalif avatar Dec 27 '23 09:12 vitalif