aws-virtual-gpu-device-plugin icon indicating copy to clipboard operation
aws-virtual-gpu-device-plugin copied to clipboard

AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads

Results 21 aws-virtual-gpu-device-plugin issues
Sort by recently updated
recently updated
newest added

I am trying to run Nvidia-triton containers for model inferencing, however when more than 1 container is allocated to the same node, one of the container 1) Either fails to...

*Issue #, if available:* No issue *Description of changes:* The old link is no longer working. Point the link to the moved repo address.

I've an AWS EKS cluster with GPU nodes, and installed AWS virtual gpu device plugin to share GPU between different pods. It seems that this exporter dependent on Nvidia device...

hello I have got error like this when I start my pod: 0/8 nodes are available: 8 Insufficient k8s.amazonaws.com/vgpu. I do have 8 nodes, of which two are g4dn.xlarge nodes,...

Bumps [github.com/gogo/protobuf](https://github.com/gogo/protobuf) from 1.3.0 to 1.3.2. Release notes Sourced from github.com/gogo/protobuf's releases. Release v.1.3.2 Tested versions: go 1.15.6 protoc 3.14.0 Bug fixes: skippy peanut butter Release v1.3.1 Tested versions: go...

dependencies

## Present Status I understand the current system configuration as follows: - Currently, the amount of GPU threads used by Pod seems to be controlled by CUDA_MPS_ACTIVE_THREAD_PERCENTAGE. - And the...

GPU sharing works perfectly fine, but when trying to scale pods based on gpu share, cluster-autoscaler is unable to scale instances based on requirement with following errors. ``` clusterautoscaler-aws-cluster-autoscaler-6dbcb4d4f7-fv5w7 aws-cluster-autoscaler...

Hello, when using this plugin, I was able to run `pytorch` models on a shared GPU and everything works smoothly but in some cases, when one pod starts using a...

Is there a way to log vgpu utilization metrics and monitor with aws-virtual-gpu-device-plugin? I currently use nvml library with datadog but it is not aware of the virtual GPUs so...

*Issue #, if available:* Manifest fails since latest tag is deprecated, moving to nvidia/cuda:11.3.0-runtime-ubuntu18.04 *Description of changes:* Move from nvidia/cuda:latest -> nvidia/cuda:11.3.0-runtime-ubuntu18.04 By submitting this pull request, I confirm that...