jupyterlab-nvdashboard
jupyterlab-nvdashboard copied to clipboard
[DISCUSSION] What GPU dashboards should be included?
We currently include the following dashboards:
- "GPU Utilization": Bar plot showing %GPU compute utilization. One bar per device
- "GPU Memory": Bar plot showing GPU-memory consumption. One bar per device
- "GPU Resources": Stacked timeline plot. Includes %GPU compute utilization (line per device), memory utilization (line per device), total GPU utilization (lines for compute and memory), and total PCIe throughpout (lines for RX and TX)
- "PCIe Throughput": Bar plots (RX and TX) showing PCIe Throughput. One bar per device
- "NVLink Throughput": (Planned in PR#18) Bar plots (RX and TX) showing NVLink Throughput. One bar per device
- "NVLink Timeline": (Planned in PR#18) Stacked timeline plots (RX and TX) showing NVLink Throughput over time. One line per device
- "Machine Resources": Stacked timeline plot. Summary of non-GPU specific metrics over time.
At this point, I think we should have a discussion about both the quantity and type of dashboards to provide.
How many dashboards make sense?
Should individual bar and timeline-based dashboards be provided for every metric?
There are two different ways of using the dashboards, as a Jupyter Lab extension and as a standalone bokeh server. Having individual plots is more useful in Jupyter Lab as you can arrange things as you like, but having lots of diagnostics in one place is useful for the standalone dashboards to avoid opening lots of browser windows.
I also think that there are two usage modes of GPU analytics that these will help with. Interactive work where GPU tasks take seconds to tens of seconds. For this I think bar dashboards are more useful. The other is long running tasks which may go from minutes to hours. In these instances the timelines will be more useful as the user will likely divert their attention and not be watching them in real time.
I guess this results in my answer being all of them are useful. The result of that however will probably be an overwhelming list of options for the user, which is not ideal.
Perhaps something to consider would be to exclude options like GPU Resources, which has many plots in one place, from the Jupyter Lab extension and instead include the individual plots. Then we could exclude the individual plots from the standalone list, or at least put multi-plot dashboards at the top of the list.
From @mrocklin in gpuopenanalytics/pynvml#10
Some consumers of this information may not want to use JupyterLab (for example, all of the CUDA programmers out there) and so we may also want to have a more dashboardy page that includes several plots laid out nicely. I imagine that this might increase the use of these dashboards.
This requires someone to take a look at the plots that we currently have and then arrange them nicely onto the page using either Bokeh layout, or just standard HTML/CSS.