K8SSAND-1780 ⁃ Unable to open hint directory /opt/cassandra/data/hints
Hi, I got error message like below:
ERROR [BatchlogTasks:1] 2022-09-15 01:42:42,107 CassandraDaemon.java:587 - Exception in thread Thread[BatchlogTasks:1,5,main]
org.apache.cassandra.io.FSWriteError: java.io.IOException: Unable to open hint directory /opt/cassandra/data/hints
...
Caused by: java.io.IOException: Unable to open hint directory /opt/cassandra/data/hints
...
ERROR [BatchlogTasks:1] 2022-09-15 01:42:42,107 DefaultFSErrorHandler.java:64 - Stopping transports as disk_failure_policy is stop
Also there're other error messages like Reading cardinality from Statistics.db failed for /var/lib/cassandra/data/system_auth/role_permissions-3afbe79f219431a7add7f5ab90d8ec9c/nb-4001-big-Data.db
I got another issue #401. I don't know if they are related but they happened in two different nodes of same cluster. #401 happens right after deployed few days ago, this issue happens today. How should I fix it? Thanks for help!
┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1780 ┆priority: Medium
I'm beginning to wonder if this has nothing to do with cass-operator, but the SMB CSI driver you're using. Could you try with something else? The CSI -> SMB sounds pretty hacky approach in most cases.
okay. I know there's local-path-storage to store in local, but local sounds will directly cost kubernetes node compare to mount to other server using csi-smb. Is that correct? Do you have another better way to mount and persist data? Thanks. Below is my SC yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fs-cassandra
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=1001
- gid=1001
parameters:
csi.storage.k8s.io/node-stage-secret-name: secret
csi.storage.k8s.io/node-stage-secret-namespace: xxx
csi.storage.k8s.io/provisioner-secret-name: secret
csi.storage.k8s.io/provisioner-secret-namespace: xxx
source: //xxxx
provisioner: smb.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
Hey, at least one thing is wrong in that StorageClass, since cass-operator requires
volumeBindingMode: WaitForFirstConsumer for a StorageClass. Also, we use uid=999, gid=999 for our cassandra user, so that might also be something that causes issues on your end if the SMB server user rights look wrong (I don't know why they need to be enforced in a CSI driver though).
As to which CSI driver is correct for your use-case, it's difficult to say - that's a bit more complex and depends entirely on your hardware setups, software, network, policy etc. and is out of scope of this project. We do personally test with cloud provider CSIs and local-paths (and some other local-path-alikes), but beyond that we can't sadly test every possible one.
thank you, let me tweak the storage class and see.
I changed to "volumeBindingMode: WaitForFirstConsumer" , uid=999, gid=999 but still has #401 issue. Let me try the local-path-storage.
okay. the local-path-storage works fine. Should I change to local-path-storage? Will it cost more damage if something wrong in future? Thanks for any input.
That depends entirely how you want to store the data. local-path-storage stores it on the disk of the Kubernetes node.