霓漠Nimbus

Results 11 comments of 霓漠Nimbus

@jiangsanyin I have followed the installation instructions as described in the documentation, but encountered a minor issue, which I also mentioned previously in Issue #498. By default, the `dcgm-exporter` only...

@jiangsanyin Both are correct. What I meant is that you might have forgotten to include the explanation for the relabel configuration of `dcgm-exporter`. By default, `dcgm-exporter` only includes the `Hostname`...

After attempting to deploy this dashboard myself, I encountered similar issues. By comparing the original metrics, I noticed the following: 1. **Missing node_name Label:** The issue with the missing node_name...

@jiangsanyin Regarding the [Add Prometheus Custom Metric Configuration](https://grafana.com/grafana/dashboards/21833-hami-vgpu-dashboard/) section in the dashboard, I think there may be some problems with the configuration provided. I am using `ServiceMonitor` directly, so there's...

@jiangsanyin The reason you don't see **Device_memory_desc_of_container** in your Prometheus metrics is that this metric is exposed by the **hami-device-plugin**. However, Prometheus does not have a scrape rule configured to...

> 单个任务使用相同厂商但不同芯片的显卡进行训练/推理(如同时使用1张V100和1张H100进行训练)? If both NVIDIA GPUs are on the same node, then it's supported. > 单个任务使用不同厂商的卡进行训练/推理(如使用1张V100+1张910B进行训练)? Not supported. Usually, frameworks like TensorFlow and PyTorch, along with their related Python libraries,...

Could you please provide the exact hami image version to help trace the specific code line? It currently appears that certain `map`-type fields in the scheduler might be accessed concurrently...

@jeonghyunkeem Got it, I checked, and I know where the problem is. This issue has already been fixed in #418, so it should no longer occur if you use the...

@badaldavda8 @IndhumithaR Privileged Pods have direct access to the host's devices—they share the host's device namespace and can directly access everything under the /dev directory. This basically bypasses the container's...