Matthew Yeung
Matthew Yeung
This is sort of related to #91. It is often useful to evaluate scheduler algorithms by evaluating the resource allocation rate. My line of thinking is that we should do...
A pod currently can specify: 1. GPU count 2. GPU Milli 3. GPU Card-model Does GPU Milli refers to the % of memory used for that specific card-model ? If...
For example in GPU 2020 trace the job 'e5d6d5b546bff61f93b47ebf' has **max_gpu_wrk_mem** '44.289062' but the gpu type is V100 where the memory capacity should be 16GB or at max 32GB !?...
Use cases: - i have multiple training (2) running on the same GPUs and i want to profile each process, but want to see them at the same time on...