ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

cephfs csi error,mds mds status Start request repeated too quickly. Failed with result 'signal'.

Open gaocheng001 opened this issue 1 year ago • 1 comments

Describe the bug

A clear and concise description of what the bug is.

Environment details

  • Image/version of Ceph CSI driver : v3.10.0 / latest
  • Helm chart version :
  • Kernel version :
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) :
  • Kubernetes cluster version : 1.27
  • Ceph cluster version : 18.2.2

MDS logs

(MDSRank::ProgressThread::entry()+0xbe) [0x56ddc480489e]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  20: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7906280a8134]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  21: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7906281287dc]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:      0> 2024-07-02T13:23:53.435+0800 79061b6006c0 -1 *** Caught signal (Aborted) **
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  in thread 79061b6006c0 thread_name:mds_rank_progr
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x79062805b050]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7906280a9e2c]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  3: gsignal()
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  4: abort()
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x790627e9d919]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x790627ea8e1a]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x790627ea8e85]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x790627ea90d8]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  9: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0xb4) [0x7906288483d4]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  10: (void decode_noshare<mempool::mds_co::pool_allocator>(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, mempool::mds_co::pool_allocator<char> >, ceph::buffer::v15_2_0::ptr, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, mempool::mds_co::pool_allocator<char> > >, mempool::mds_co::pool_allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, mempool::mds_co::pool_allocator<char> > const, ceph::buffer::v15_2_0::ptr> > >&, ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xd9) [0x56ddc48d64a9]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  11: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t const*)+0x12b3) [0x56ddc48695e3]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  12: (Server::handle_client_mkdir(boost::intrusive_ptr<MDRequestImpl>&)+0x1dc) [0x56ddc48a0dbc]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  13: (Server::handle_client_request(boost::intrusive_ptr<MClientRequest const> const&)+0x61f) [0x56ddc48abebf]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  14: (Server::dispatch(boost::intrusive_ptr<Message const> const&)+0x2d3) [0x56ddc48b0a13]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  15: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, bool)+0x59b) [0x56ddc48051eb]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  16: (MDSRank::retry_dispatch(boost::intrusive_ptr<Message const> const&)+0x12) [0x56ddc4805992]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  17: (MDSContext::complete(int)+0x5b) [0x56ddc4b2c45b]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  18: (MDSRank::_advance_queues()+0x78) [0x56ddc4804328]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  19: (MDSRank::ProgressThread::entry()+0xbe) [0x56ddc480489e]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  20: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7906280a8134]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  21: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7906281287dc]
Jul 02 13:23:53 pve24052603 ceph-mds[3604713]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jul 02 13:23:53 pve24052603 systemd[1]: [email protected]: Main process exited, code=killed, status=6/ABRT
Jul 02 13:23:53 pve24052603 systemd[1]: [email protected]: Failed with result 'signal'.
Jul 02 13:23:53 pve24052603 systemd[1]: [email protected]: Scheduled restart job, restart counter is at 3.
Jul 02 13:23:53 pve24052603 systemd[1]: Stopped [email protected] - Ceph metadata server daemon.
Jul 02 13:23:53 pve24052603 systemd[1]: [email protected]: Start request repeated too quickly.
Jul 02 13:23:53 pve24052603 systemd[1]: [email protected]: Failed with result 'signal'.
Jul 02 13:23:53 pve24052603 systemd[1]: Failed to start [email protected] - Ceph metadata server daemon.

systemctl status ceph-mds@pve24060802

1]: [email protected]: Scheduled restart job, restart counter is at 3.
Jul 02 13:24:24 pve24060802 systemd[1]: Stopped [email protected] - Ceph metadata server daemon.
Jul 02 13:24:24 pve24060802 systemd[1]: [email protected]: Start request repeated too quickly.
Jul 02 13:24:24 pve24060802 systemd[1]: [email protected]: Failed with result 'signal'.
Jul 02 13:24:24 pve24060802 systemd[1]: Failed to start [email protected] - Ceph metadata server daemon.

 unhandled message 0x55b3c1551b80 client_metrics [client_metric_type: CAP_INFO cap_hits: 22 cap_misses: 1 num_caps: 3][client_metric_type: READ_LATENCY latency: 0.000000, avg_latency: 0.000000, sq_sum: 0, count=0][client_metric_type: WRITE_LATENCY latency: 0.000000, avg_latency: 0.000000, sq_sum: 0, count=0][client_metric_type: METADATA_LATENCY latency: 0.042123, avg_latency: 0.010530, sq_sum: 1234407162999340, count=4][client_metric_type: DENTRY_LEASE dlease_hits: 0 dlease_misses: 0 num_dentries: 2][client_metric_type: OPENED_FILES opened_files: 0 total_inodes: 3][client_metric_type: PINNED_ICAPS pinned_icaps: 3 total_inodes: 3][client_metric_type: OPENED_INODES opened_inodes: 0 total_inodes: 3][client_metric_type: READ_IO_SIZES total_ops: 0 total_size: 0][client_metric_type: WRITE_IO_SIZES total_ops: 0 total_size: 0] v1 from client.128861978 v1:10.18.10.104:0/2284126948

nginx.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginx-test-pvc
  namespace: elk
spec:
  storageClassName: csi-cephfs-sc
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

---
apiVersion: v1
kind: Pod
metadata:
  name: csi-rbd-demo-pod
spec:
  volumes:
    - name: nginxtestpvc
      persistentVolumeClaim:
        claimName: nginx-test-pvc
        readOnly: false
  containers:
    - name: web-server
      image: docker.io/library/nginx:latest
      volumeMounts:
#        - name: es-snap-pvc
#          mountPath: /var/lib/www/html/plugins
        - name: nginxtestpvc
          mountPath: /var/lib/www/html
#        - name: es-plugin-pvc
#          mountPath: /var/lib/www/html/config
#        - name: es-config-pvc
#          mountPath: /var/lib/www/html/data
#  volumes:
#    - name: es-snap-pvc
#      persistentVolumeClaim:
#        claimName: es-snap-pvc
#        readOnly: false
#
#    - name: es-plugin-pvc
#      persistentVolumeClaim:
#        claimName: es-snap-pvc
#        readOnly: false
#
#    - name: es-config-pvc
#      persistentVolumeClaim:
#        claimName: es-snap-pvc
#        readOnly: false

sc.yaml#

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-cephfs-sc
provisioner: cephfs.csi.ceph.com
parameters:
  # (required) String representing a Ceph cluster to provision storage from.
  # Should be unique across all Ceph clusters in use for provisioning,
  # cannot be greater than 36 bytes in length, and should remain immutable for
  # the lifetime of the StorageClass in use.
  # Ensure to create an entry in the configmap named ceph-csi-config, based on
  # csi-config-map-sample.yaml, to accompany the string chosen to
  # represent the Ceph cluster in clusterID below
  clusterID: "6254728d-2db0-4fe1-bab6-e415d0c697f6"

  # (required) CephFS filesystem name into which the volume shall be created
  # eg: fsName: myfs
  fsName: cephfs-hdd-local

  # (optional) Ceph pool into which volume data shall be stored
  # pool: <cephfs-data-pool>

  # (optional) Comma separated string of Ceph-fuse mount options.
  # For eg:
  # fuseMountOptions: debug

  # (optional) Comma separated string of Cephfs kernel mount options.
  # Check man mount.ceph for mount options. For eg:
  # kernelMountOptions: readdir_max_bytes=1048576,norbytes

  # The secrets have to contain user and/or Ceph admin credentials.
  csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: default
  csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/node-stage-secret-namespace: default

  # (optional) The driver can use either ceph-fuse (fuse) or
  # ceph kernelclient (kernel).
  # If omitted, default volume mounter will be used - this is
  # determined by probing for ceph-fuse and mount.ceph
  # mounter: kernel

  # (optional) Prefix to use for naming subvolumes.
  # If omitted, defaults to "csi-vol-".
  # volumeNamePrefix: "foo-bar-"

  # (optional) Boolean value. The PVC shall be backed by the CephFS snapshot
  # specified in its data source. `pool` parameter must not be specified.
  # (defaults to `true`)
  # backingSnapshot: "false"

  # (optional) Instruct the plugin it has to encrypt the volume
  # By default it is disabled. Valid values are "true" or "false".
  # A string is expected here, i.e. "true", not true.
  # encrypted: "true"

  # (optional) Use external key management system for encryption passphrases by
  # specifying a unique ID matching KMS ConfigMap. The ID is only used for
  # correlation to configmap entry.
  # encryptionKMSID: <kms-config-id>


reclaimPolicy: Delete
allowVolumeExpansion: true
# mountOptions:
#   - context="system_u:object_r:container_file_t:s0:c0,c1"
  
  
  
  
  

gaocheng001 avatar Jul 02 '24 05:07 gaocheng001

@gaocheng001 Could you please explain why you think the error is caused by cephcsi?

Madhu-1 avatar Jul 02 '24 13:07 Madhu-1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 02 '24 21:08 github-actions[bot]

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

github-actions[bot] avatar Aug 09 '24 21:08 github-actions[bot]