cass-operator icon indicating copy to clipboard operation
cass-operator copied to clipboard

K8SSAND-1780 ⁃ Unable to open hint directory /opt/cassandra/data/hints

Open shengzhizhou opened this issue 3 years ago • 8 comments

Hi, I got error message like below:

ERROR [BatchlogTasks:1] 2022-09-15 01:42:42,107 CassandraDaemon.java:587 - Exception in thread Thread[BatchlogTasks:1,5,main]
org.apache.cassandra.io.FSWriteError: java.io.IOException: Unable to open hint directory /opt/cassandra/data/hints
...
Caused by: java.io.IOException: Unable to open hint directory /opt/cassandra/data/hints
...
ERROR [BatchlogTasks:1] 2022-09-15 01:42:42,107 DefaultFSErrorHandler.java:64 - Stopping transports as disk_failure_policy is stop

Also there're other error messages like Reading cardinality from Statistics.db failed for /var/lib/cassandra/data/system_auth/role_permissions-3afbe79f219431a7add7f5ab90d8ec9c/nb-4001-big-Data.db

I got another issue #401. I don't know if they are related but they happened in two different nodes of same cluster. #401 happens right after deployed few days ago, this issue happens today. How should I fix it? Thanks for help!

┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1780 ┆priority: Medium

shengzhizhou avatar Sep 15 '22 01:09 shengzhizhou

I'm beginning to wonder if this has nothing to do with cass-operator, but the SMB CSI driver you're using. Could you try with something else? The CSI -> SMB sounds pretty hacky approach in most cases.

burmanm avatar Sep 15 '22 12:09 burmanm

okay. I know there's local-path-storage to store in local, but local sounds will directly cost kubernetes node compare to mount to other server using csi-smb. Is that correct? Do you have another better way to mount and persist data? Thanks. Below is my SC yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fs-cassandra
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=1001
- gid=1001
parameters:
  csi.storage.k8s.io/node-stage-secret-name: secret
  csi.storage.k8s.io/node-stage-secret-namespace: xxx
  csi.storage.k8s.io/provisioner-secret-name: secret
  csi.storage.k8s.io/provisioner-secret-namespace: xxx
  source: //xxxx
provisioner: smb.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

shengzhizhou avatar Sep 15 '22 13:09 shengzhizhou

Hey, at least one thing is wrong in that StorageClass, since cass-operator requires

volumeBindingMode: WaitForFirstConsumer for a StorageClass. Also, we use uid=999, gid=999 for our cassandra user, so that might also be something that causes issues on your end if the SMB server user rights look wrong (I don't know why they need to be enforced in a CSI driver though).

burmanm avatar Sep 15 '22 13:09 burmanm

As to which CSI driver is correct for your use-case, it's difficult to say - that's a bit more complex and depends entirely on your hardware setups, software, network, policy etc. and is out of scope of this project. We do personally test with cloud provider CSIs and local-paths (and some other local-path-alikes), but beyond that we can't sadly test every possible one.

burmanm avatar Sep 15 '22 13:09 burmanm

thank you, let me tweak the storage class and see.

shengzhizhou avatar Sep 15 '22 13:09 shengzhizhou

I changed to "volumeBindingMode: WaitForFirstConsumer" , uid=999, gid=999 but still has #401 issue. Let me try the local-path-storage.

shengzhizhou avatar Sep 15 '22 15:09 shengzhizhou

okay. the local-path-storage works fine. Should I change to local-path-storage? Will it cost more damage if something wrong in future? Thanks for any input.

shengzhizhou avatar Sep 15 '22 15:09 shengzhizhou

That depends entirely how you want to store the data. local-path-storage stores it on the disk of the Kubernetes node.

burmanm avatar Sep 16 '22 11:09 burmanm