gluster-kubernetes
gluster-kubernetes copied to clipboard
Heteki no space error
I'm trying to setup a glusterfs cluster with kubernetes. I managed to start the glusterd pods on all the nodes (3 nodes) I also managed to load the topology successfully, however when I run
heketi-cli setup-openshift-heketi-storage
I get the following error:
Error: No space
This is the output of
heketi-cli topology load --json=gluster-kubernetes/deploy/topology.json
Found node vps01 on cluster 1a36667e4275773fc353f2caaaaaaa
Adding device /dev/loop0 ... OK
Found node vps02 on cluster 1a36667e4275773fc353faaaaaaaa
Found device /dev/loop0
Found node vps04 on cluster 1a36667e4275773fc353faaaaaaa
Adding device /dev/loop0 ... OK
Output of
heketi-cli topology info
Cluster Id: 1a36667e4275773fc353f2caaaaaa
File: true
Block: true
Volumes:
Nodes:
Node Id: 1752dcf447c8eb6eaad45aaaa
State: online
Cluster Id: 1a36667e4275773fc353f2caaa
Zone: 1
Management Hostnames: vps01
Storage Hostnames: XX.XX.XX.219
Devices:
Id:50396d72293c4723504810108bd75d41 Name:/dev/loop0 State:online Size (GiB):12 Used (GiB):0 Free (GiB):12
Bricks:
Node Id: 56b8c1942b347a863ee73a005758cc27
State: online
Cluster Id: 1a36667e4275773fc353f2c8eb2dd2a3
Zone: 1
Management Hostnames: vps04
Storage Hostnames: XX.XX.XX.227
Devices:
Id:dc75ad8154234ebcf9174b018d0bc30a Name:/dev/loop0 State:online Size (GiB):9 Used (GiB):4 Free (GiB):5
Bricks:
Node Id: f82cb81a026884764d3d953c7c9b6a9f
State: online
Cluster Id: 1a36667e4275773fc353f2c8eb2dd2a3
Zone: 1
Management Hostnames: vps02
Storage Hostnames: XX.XX.XX.157
Devices:
Id:1914102b7ae395f12797981a0e3cf5a4 Name:/dev/loop0 State:online Size (GiB):4 Used (GiB):4 Free (GiB):0
Bricks:
There is no more space on device 1914102b7ae395f12797981a0e3cf5a4, however I didn't not store anything yet on the device.
For info here is the topology.json file:
{
"clusters": [
{
"nodes": [
{
"node": {
"hostnames": {
"manage": [
"vps01"
],
"storage": [
"XX.XX.XX.219"
]
},
"zone": 1
},
"devices": [
"/dev/loop0"
]
},
{
"node": {
"hostnames": {
"manage": [
"vps02"
],
"storage": [
"XX.XX.XX.157"
]
},
"zone": 1
},
"devices": [
"/dev/loop0"
]
},
{
"node": {
"hostnames": {
"manage": [
"vps04"
],
"storage": [
"XX.XX.XX.227"
]
},
"zone": 1
},
"devices": [
"/dev/loop0"
]
}
]
}
]
}
The lack of space on 1914102b7ae395f12797981a0e3cf5a4 is almost certainly the cause of the out of space error you are seeing. Because the heketidbstorage that the command creates is replica 3 it needs to place a brick on that device and it can not due to the lack of free space.
I noticed the sizes of the devices are all different. Is this intentional? At any point did you run the heketi-cli device resync
command ?
Another thing you can try is to run the heketi-cli db dump
command and inspect the json output. If there are pending operations in the db it could mean that a volume that was only partially created is using space on the device.
Also, you could log into the gluster pods and use lvm commands like lvs
to check and see if any storage for bricks was carved out of the device vg. (Note: each brick will map to two lvs - the primary lv and a thinpool lv, the primary lv is "inside" the thinpool)
@phlogistonjohn I agree with the reason but I don't understand why it is full. I've created a loop device which is empty, so how come it is marked as full by heketi ?
So I've delete all the volumes and all pods, svc etc.. I've recreated new loop device with 5GB size I've run
./gk-deploy -n gluster -w 900 -g -y
Using Kubernetes CLI.
Using namespace "gluster".
Checking for pre-existing resources...
GlusterFS pods ... not found.
deploy-heketi pod ... not found.
heketi pod ... not found.
gluster-s3 pod ... not found.
Creating initial resources ... serviceaccount/heketi-service-account created
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view created
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view labeled
OK
node/vps01 labeled
node/vps02 labeled
node/vps04 labeled
daemonset.extensions/glusterfs created
Waiting for GlusterFS pods to start ... OK
Error from server (AlreadyExists): secrets "heketi-config-secret" already exists
secret/heketi-config-secret not labeled
service/deploy-heketi created
deployment.extensions/deploy-heketi created
Waiting for deploy-heketi pod to start ... OK
Creating cluster ... ID: cb42bacc3e5c68aaa07d143840a8f64c
Allowing file volumes on cluster.
Allowing block volumes on cluster.
Creating node vps01 ... ID: bf65f800524682813a5b125c319957cd
Adding device /dev/loop0 ... OK
Creating node vps02 ... ID: b6ea2328e2dce54f43e8a9f8ccabbde3
Adding device /dev/loop0 ... OK
Creating node vps04 ... ID: 0b4f3556c2139a98d3383704de072573
Adding device /dev/loop0 ... OK
heketi topology loaded.
Error: No space
command terminated with exit code 255
Failed on setup openshift heketi storage
This may indicate that the storage must be wiped and the GlusterFS nodes must be reset.
And this is the output of
heketi-cli topology info
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
File: true
Block: true
Volumes:
Nodes:
Node Id: 0b4f3556c2139a98d3383704de072573
State: online
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
Zone: 1
Management Hostnames: vps04
Storage Hostnames: XXX.XXX.XXX.227
Devices:
Id:a5c7f5ebc4c58c5e84279f195ac1a352 Name:/dev/loop0 State:online Size (GiB):4 Used (GiB):4 Free (GiB):0
Bricks:
Node Id: b6ea2328e2dce54f43e8a9f8ccabbde3
State: online
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
Zone: 1
Management Hostnames: vps02
Storage Hostnames: XXX.XXX.XXX.157
Devices:
Id:669c53412bc14502ebef9f30dda6c64c Name:/dev/loop0 State:online Size (GiB):4 Used (GiB):4 Free (GiB):0
Bricks:
Node Id: bf65f800524682813a5b125c319957cd
State: online
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
Zone: 1
Management Hostnames: vps01
Storage Hostnames: XXX.XXX.XXX.219
Devices:
Id:ab5c466e880855b1bc94a5a90e05f6cb Name:/dev/loop0 State:online Size (GiB):4 Used (GiB):0 Free (GiB):4
Bricks:
I've also tried to run heketi-cli device resync for all the devices, after that the heketi topology info shows that all the devices are free. I've then rerun (without deleting anything) ./gk-deploy -n gluster -w 900 -g -y
And I still get the same error... Is there a minimum required size ?
Thanks
Yes, the minimum size is 2Gi.
Without an up-to-date topology info
& heketi logs or a db dump I'm afraid there's not much more I can do. Please note that device resync can help in some situations but it also has bugs and I've seen it shrink the volume size (incorrectly). I don't recommend running it except as a last resort.
Here is the topology info:
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
File: true
Block: true
Volumes:
Nodes:
Node Id: 0b4f3556c2139a98d3383704de072573
State: online
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
Zone: 1
Management Hostnames: vps04
Storage Hostnames: 51.68.47.227
Devices:
Id:a5c7f5ebc4c58c5e84279f195ac1a352 Name:/dev/loop0 State:online Size (GiB):4 Used (GiB):4 Free (GiB):0
Bricks:
Node Id: b6ea2328e2dce54f43e8a9f8ccabbde3
State: online
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
Zone: 1
Management Hostnames: vps02
Storage Hostnames: 5.196.23.157
Devices:
Id:669c53412bc14502ebef9f30dda6c64c Name:/dev/loop0 State:online Size (GiB):4 Used (GiB):4 Free (GiB):0
Bricks:
Node Id: bf65f800524682813a5b125c319957cd
State: online
Cluster Id: cb42bacc3e5c68aaa07d143840a8f64c
Zone: 1
Management Hostnames: vps01
Storage Hostnames: 51.68.225.219
Devices:
Id:ab5c466e880855b1bc94a5a90e05f6cb Name:/dev/loop0 State:online Size (GiB):4 Used (GiB):0 Free (GiB):4
Bricks:
I can't find any logs in the container (nothing with kubectl logs nor journalctl and nothing in /var/log). Where can I find the heketi logs for the container ?
Getting nothing from the kubectl logs
command sounds weird. Typically there will be some logging generated by the server when you make requests to it. Heketi logs to stdio so either systemd or the container system will be capturing the logging.
This topology output shows free space of 0 for devices 669c53412bc14502ebef9f30dda6c64c and a5c7f5ebc4c58c5e84279f195ac1a352, so I see why you are getting the no space error again.
If you log on to the gluster pods (via kubectl exec for example) what do the lvs
and vgs
commands show?
If I log to the glusterfs pods, here is the output:
[root@vps01 /]# lvs
[root@vps01 /]# vgs
VG #PV #LV #SN Attr VSize VFree
vg_01c148fa8b180ce37e64e42354e93732 1 0 0 wz--n- <4.88g <4.88g
[root@vps02 /]# lvs
[root@vps02 /]# vgs
VG #PV #LV #SN Attr VSize VFree
vg_c7bc3aef090bfe32076c8634020330cf 1 0 0 wz--n- <4.88g <4.88g
[root@vps04 /]# lvs
[root@vps04 /]# vgs
VG #PV #LV #SN Attr VSize VFree
vg_72e918263ddbdb987c5c19943433d823 1 0 0 wz--n- <4.88g <4.88g
The volumes seems to be free here. Moreover, I've deleted and recreated all the volumes prior to running the command.
Here is the command I run:
./gk-deploy -n gluster -w 900 -g -y
Thanks for your help
Very odd indeed. Would you be willing to put a db dump on a pastebin? If so, run heketi-cli db dump
to from within the pod get the json dump and put it on fpaste.org or a pastebin of your choice.
I've provisioned a new node and removed the old one and now I passed this step. However now I get the following error:
./gk-deploy -n gluster -w 900 -g -y topology.json
Using Kubernetes CLI.
Using namespace "gluster".
Checking for pre-existing resources...
GlusterFS pods ... found.
deploy-heketi pod ... found.
heketi pod ... not found.
gluster-s3 pod ... not found.
Creating initial resources ... Error from server (AlreadyExists): error when creating "/home/ben/k8s/gluster-kubernetes/deploy/kube-templates/heketi-service-account.yaml": serviceaccou
nts "heketi-service-account" already exists
Error from server (AlreadyExists): clusterrolebindings.rbac.authorization.k8s.io "heketi-sa-view" already exists
clusterrolebinding.rbac.authorization.k8s.io/heketi-sa-view not labeled
OK
Found node vps01 on cluster 596df7e07ab71717092785ce0f4c0c72
Found device /dev/loop0
Found node vps02 on cluster 596df7e07ab71717092785ce0f4c0c72
Found device /dev/loop0
Found node vps04 on cluster 596df7e07ab71717092785ce0f4c0c72
Found device /dev/loop0
heketi topology loaded.
Error: Volume heketidbstorage alreay exists
command terminated with exit code 255
Failed on setup openshift heketi storage
This may indicate that the storage must be wiped and the GlusterFS nodes must be reset.
[root@deploy-heketi-559446b649-6z9w9 /]# heketi-cli topology info
Cluster Id: 07d0a6d37eb03d98081776ecba94ee27
File: true
Block: true
Volumes:
Nodes:
Node Id: 5502b48c704c3cd3ca0bd44b45793ad1
State: online
Cluster Id: 07d0a6d37eb03d98081776ecba94ee27
Zone: 1
Management Hostnames: vps04
Storage Hostnames: 51.68.XX.XX1
Devices:
Id:c419affdc56e8cc65cc89109aafe08bf Name:/dev/loop0 State:online Size (GiB):12 Used (GiB):2 Free (GiB):10
Bricks:
Id:2a72d082a3d4b92b513b92fa99d269ab Size (GiB):2 Path: /var/lib/heketi/mounts/vg_c419affdc56e8cc65cc89109aafe08bf/brick_2a72d082a3d4b92b513b92fa
99d269ab/brick
Node Id: 91f513210187420b8746d6f4bc05d855
State: online
Cluster Id: 07d0a6d37eb03d98081776ecba94ee27
Zone: 1
Management Hostnames: vps01
Storage Hostnames: 51.68.XXX.XXX
Devices:
Id:88e0c894ad70e8e199ad91c7a8925faf Name:/dev/loop0 State:online Size (GiB):12 Used (GiB):2 Free (GiB):10
Bricks:
Id:f08a791be3155ee5791dfbee31aa6b0e Size (GiB):2 Path: /var/lib/heketi/mounts/vg_88e0c894ad70e8e199ad91c7a8925faf/brick_f08a791be3155ee5791dfbee
31aa6b0e/brick
Node Id: ca245feedc741e2b1706aecc628e0661
State: online
Cluster Id: 07d0a6d37eb03d98081776ecba94ee27
Zone: 1
Management Hostnames: vps02
Storage Hostnames: 51.68.X1.XXX
Devices:
Id:758454435cc0e6cb7fd1a0daafb877ce Name:/dev/loop0 State:online Size (GiB):12 Used (GiB):2 Free (GiB):10
Bricks:
Id:c3997e9d9ae08802b293e4def686ecbc Size (GiB):2 Path: /var/lib/heketi/mounts/vg_758454435cc0e6cb7fd1a0daafb877ce/brick_c3997e9d9ae08802b293e4de
f686ecbc/brick
And the heketi logs can be found here: https://gist.github.com/bend/4d355203c3edab80831c343f9a9210d9
The error is weird:
[heketi] WARNING 2018/08/31 07:33:39 failed to delete volume f0a524cf1265ff8fb27405ac42ef93af via vps01: Unable to delete volume heketidbstorage: Unable to execute command on glusterfs
-vhkdb: volume delete: heketidbstorage: failed: Some of the peers are down
Any idea ?
I'm seeing this issue as well, did you ever find a solution?
In my case is beacause I had only 2 nodes on kubernetes. The setup-openshift-heketi-storage command doesn't uses the --replica param reflecting the topology.