cloudbridge
cloudbridge copied to clipboard
Add method to get GPU count for VmType
There's currently no way to get the number of GPUs per instance type or any related information. see: https://github.com/CloudVE/cloudbridge/issues/153
There are at least two problems here, the first is determining the number of GPUs supported by an instance type. The second is actually doing something useful with that information so that it's possible to boot a machine with a GPU. Just documenting the specifics for each cloud here.
AWS
GPUs are tied to instance types. There is a notion of elastic GPUs that can be dynamically attached, but that appears to be a Windows only feature. To use an instance with a GPU, you must select an appropriate instance type with GPU support, and launch an AMI that has the appropriate drivers preinstalled. It's fairly easy to obtain the instance types that support GPUs and the number of GPUs supported from instance type data.
GCE
ON GCE, any "standard" or "custom" instance type can potentially support GPUs and the number appears to be elastic. The GPUs are dynamically attached, and the required number of GPUs is a parameter that must be supplied at instance creation time. Therefore, we can probably return the maximum number of GPUs that a standard instance type can potentially support as the GPU count, but there doesn't seem to be an endpoint for doing this - we'll have to figure out which instance types are "standard".
OpenStack
The GPU count is generally available as "extra" flavor data, which is relatively non-standard. For example, the nova docs here suggest a VGPU property which could be used. Other docs suggest slightly different properties. We can probably look for some common keys in extra data and have a reasonably workable solution.
Azure
Azure does not seem to have an endpoint for determining the GPU count. As suggested here: https://github.com/CloudVE/cloudbridge/issues/153 we will probably have to use a semi-hardcoded/regex based list to determine which instance types support this.
Overall
Overall, it looks like we could potentially add a gpu_count property that's not all that accurate, but can give a reasonably useful result. We would also need to add a num_gpus property to the instance_service.create() method, primarily to support GCE which requires a count. More specifically, the GPUs could be defined through the launch_config, by something like launch_config.add_gpu_device(model, gpus)
. On GCE, this will work fine, but on other VMs, if the requested num_gpus is not equal to the flavour's num_gpus, we'd probably just have to throw an exception. If the num_gpus=1, then it'll probably just work in all cases, but again, it must be requested at instance.create() time for the benefit of GCE.
@nuwang I would like work on the above issue and contribute my solution to the problem .Please provide me with the necessary requirements so that I will start working on it. Since it is mandatory for my course requirement I assure you for the completion of the issue.
@mnss8991 We'd be happy to accept a PR on this issue. Hopefully, the above description is enough for you to get started - feel free to ping us with any further questions.
Hi, I would like to work on this issue. Could you please provide me the approval and required documents for me to understand the project and that will help me contribute to this issue.
Hi my name is Aarushi Soni . I want to contribute to this issue . Is this issue still open ? I am first time contributor . Please guide me through this process.