Paweł (roy) Rościszewski
Paweł (roy) Rościszewski
@Dubrzr could you please provide the output of the following command on your gpu2 server: ``` awk '{u=$2+$4; t=$2+$4+$5; if (NR==1){u1=u; t1=t;} else print ($2+$4-u1) * 100 / (t-t1); }'
Thanks... wrong intuition then... This is indeed the right endpoint. My sample output: ``` { "ai": { "CPU": { "CPU_ai": { "index": 0, "metrics": { "mem_free": { "unit": "MiB", "value":...
@Dubrzr: and how about this command: ``` nvidia-smi --query-gpu=name,fan.speed,utilization.gpu --format=csv,nounits ``` I see that you have a newer version of NVIDIA driver (the newest version that we've tested is 418.116),...
Everything looks fine here... Could you try modifying line 73 in tensorhive/core/managers/TensorHiveManager.py and set: ``` monitors = [] ``` and see if it helps?
@Dubrzr do you have any new observations or hints? If the data was lacking for gpu3, we would at least have an idea that the differing Fan speed "[N/A]" notation...
If so, I'll leave it open with nice-to-have label. Maybe one day someone picks it up
I suppose that there should be a cache in backend and API parameter defining how many recent states should be returned...
If so, I'll leave it open with nice-to-have label. Maybe one day someone picks it up
Thanks for the report! TensorHive reads GPU utilization from nvidia-smi, so the inconsistency may be connected to polling frequency. TensorHive itself does not differentiate between processes in GPU utilization monitoring....
> tensorhive: > > > Average GPU utilization: 87% > > Average GPU memory utilization: 18% > > Start: Thursday, March 19th, 12:00 > > End: Friday, March 20th, 15:00...