Suchismita Sur

Results 3 issues of Suchismita Sur

Came across the metrics exporter, however am not able to set it up, The errors are: ``` {"level":"info","ts":1679291005.7844253,"msg":"reading metrics file","metricsFile":""} {"level":"error","ts":1679291005.7844558,"msg":"failed to read metrics file","error":"open : no such file or...

question

With refernce to https://github.com/AliyunContainerService/gpushare-scheduler-extender/issues/145 , there has been a olution given to use the extender in EKS, however the solution only works for Kubernetes v1.23 and below. Since kubernetes v1.24,...

Trying to obtain per-process GPU metrics using DCGM-exporter logs from nvhostengine : ``` root@dcgm-exporter-tlb4f:/# 2021-11-23 00:15:28.951 ERROR [82:82] Cannot initialize the hostengine: Error: Failed to initialize NVML [/workspaces/dcgm-rel_dcgm_2_3-postmerge/dcgmlib/src/DcgmHostEngineHandler.cpp:3647] [DcgmHostEngineHandler::Init] bash:...