one icon indicating copy to clipboard operation
one copied to clipboard

[vm_deploy] does not check for available disk capacity

Open TGM opened this issue 1 year ago • 2 comments

Description You can deploy any number of VMs on the same datastore as long as the disk size is under the TOTAL_MB size. In other works, vm_deploy does not check the ALLOCATED capacity, and it can't since the ALLOCATED capacity is not a computed parameter.

This is a HUGE issue, due to the fact that when the allocated space starts to fill up and the VMs will start crashing due to overprovisioning.

To Reproduce Create a new datastore with 30GB. Create 2 new VM under this datastore with 20GB each.

Expected behavior The 1st VM should provision. The 2nd VM should fail due to unavailable space.

Details The 2nd VM does not fail, as there is indeed free space on the DISK due to thin provisioning. Other types of provisioning are available mostly for vCenter, but the ALLOCAED space has been overprovisioned. It should fail to provisioning returning an error with not available space.

Affected versions:

  • 6.4.0.1-CE
  • 6.8.0-CE

The same behavior can be observed in vm_disk_resize.

Details available in:

https://github.com/OpenNebula/one/blob/master/src/rm/RequestManager.cc https://github.com/OpenNebula/one/blob/master/src/rm/RequestManagerVirtualMachine.cc

Please raise the priority to HIGH as all production environments are affected.

Related:

https://github.com/OpenNebula/one/issues/1982

TGM avatar Jan 18 '24 09:01 TGM

I agree with your @TGM , but this more like a feature request than a bug. I think the idea of the cloud is to give you the chance to create VMs as many as you want, and the cloud admin should have a monitoring system to check the FS usage....if you are running out of space, you know you need to add more disks to your cloud, so user dont be affected.

Franco-Sparrow avatar Jan 19 '24 15:01 Franco-Sparrow

while the idea is that, without a metric for allocation, it's impossible to know overindexing of resources before it's too late.

It's not impossible to have a case where lots of vm gets allocated and suddenly all the guests start generating data that fills a lot of storage, causing large scale failures all over the place if overallocated over certain amounts.

So one should be able to overallocate, but one should be very much aware of how much over-allocation is happening, so this information is paramount to have for proper management of a large platform, both in metric form and something that's easy to see from dashboards

To tl;dr; while this is a feature missing, so it's a feature request, this is a massive availability risk

kurojishi avatar Jan 22 '24 11:01 kurojishi