Add UI to explore build instrumentation data
Feature Request
How can we make CDash better?
https://github.com/Kitware/CDash/pull/2612 added build instrumentation data to the GraphQL API, which achieves the basic goal of being able to download this information from CDash as described in https://github.com/Kitware/CDash/issues/2395. Most users want to view the instrumentation data in CDash itself, rather than just querying an API programmatically though. As a basic starting place, I propose the following pages. All of these pages can be implemented using existing data, and performance is unlikely to be a concern.
- [ ]
/builds/<id>/targets-- A table which displays the target name, type, and label(s) at a minimum. A further improvement to the page would be to add aggregate information about associated commands, including the sum of the command durations. - [ ]
/targets/<id>-- Every bit of information we have about a target, including a table of associated commands, labels, build, previous builds containing the target, etc... - [ ]
/builds/<id>/commands-- A table of commands associated with a given build, including start time, duration, and type at a minimum. Similar to the test measurement pages, we could consider displaying columns of user-selectable measurements as well. It would be great to add a visualization of the relative start and stop times for each command. - [ ]
/targets/<id>/commands-- Similar to/builds/<id>/commands, except only the commands for a given target. All measurement information could be displayed by default.
After these basic pages are implemented, we could consider building pages which show the history of targets and individual commands over time. Such pages would be extremely powerful, but a naive implementation is unlikely to be efficient. These "aggregate" pages would likely require new ways get gather aggregate information via GraphQL.
Throughout all of the proposed pages, there is a substantial opportunity for us to add visualizations which would otherwise require dedicated tools, greatly enhancing the usefulness of CDash for analyzing build results.
@williamjallen
/builds/<build-id>/commands-- A table of commands associated with a given build, including start time, duration, and type at a minimum. Similar to the test measurement pages, we could consider displaying columns of user-selectable measurements as well. It would be great to add a visualization of the relative start and stop times for each command.
Yes, that page with more fields. This should be a flat table of commands for a given buildID (parent or subproject) with the columns:
- Target Name
- Target Type
- Target Labels
- Language
- Role
- Start Time
- Duration
- Source
- Outputs
- Outputs Sizes
- Host Memory Used Before
- Host Memory Used After
- CPU Load Average Before
- CPU Load Average After
And it would be good to be able to:
- Filter based on any (or all) of those fields
- Sort by columns (ascending or descending)
- Hide any set of columns
- Provide sums of the "Duration" and "Outputs Sizes" columns (for the filtered commands).
- Provide maxes for "Host Memory Used" and "CPU Load Average" columns (for the filtered commands).
That would allow you to filter and get stats for all of the object files vs links, all of the commands for a given target, etc. You can get a lot from that. (NOTE: You could also filter by the directory path of the "Output File".)
For more data about a command you could have a page /builds/<build-id>/commands/<cmnd-id> that provides all of that data in addition to the full command (or the truncated command if that is all that was uploaded). But I don't personally care about that page.
And you might provide hyperlinks to a given target (with <target-id>) from the page /builds/<build-id>/commands to link to /targets/<target-id> to provide info on a given target like:
- Target Name
- Target Type
- Target Labels
- Outputs
- Total number of commands
- Sum of duration for all commands
- Sum of Output Sizes for all commands
- Max Host Memory across all commands
- Max CPU Load Average across all commands
And I guess you could also have the page /builds/<build-id>/targets that provides a table of targets with those same fields. And you would need filters and sorting columns for that page as well to be useful. (Otherwise, I would not bother.)
There is no need for a page /targets/<id>/commands because if you want to see all of the commands for a given target, you can just go to a page /builds/<id>/commands and filter based on the target name.
But I think less is more to start with. So, IMHO, you would just have one page /builds/<id>/commands as described above and more pages could be added as desired by a paying customer.
Otherwise, let's discuss.
@bartlettroscoe
Yes, that page with more fields. This should be a flat table of commands for a given buildID (parent or subproject) with the columns:
Target Name Target Type Target Labels Language Role Start Time Duration Source Outputs Outputs Sizes Host Memory Used Before Host Memory Used After CPU Load Average Before CPU Load Average After
I think that's far too many columns to fit across a single page. Realistically, I see two main use cases here:
- Users who want to look at data on a per-target basis and don't really care about what the actual commands were. (e.g., "which targets which took the longest to build?")
- Users who care about the actual type of commands being run, and don't really care about which targets they're associated with. (e.g., "Did commands in a specific working directory take longer?" or "How efficient were my compile commands?")
Is there a use case where it might be useful to filter commands by target information? Sure, but I don't think it necessarily needs to take up a large portion of the table when the page is specifically about showing commands. That's why I'm in favor of having both a page which displays commands for a build and targets for a build. Another motivation for having two pages here is that each page is dedicated to a single type of record. Pages which join multiple records together are often less performant, in addition to being harder to implement filters for since there isn't a one-to-one mapping between columns in the table and fields in a specific type being filtered.
And you might provide hyperlinks to a given target (with
) from the page /builds/ /commands to link to /targets/
Definitely. I imagine all of the pages I proposed having lots of links between targets, commands, measurements, builds, etc... Eventually, we can also link build warnings/errors to targets and add links to target/command pages from the build error page as well.
And I guess you could also have the page /builds/
/targets that provides a table of targets with those same fields. And you would need filters and sorting columns for that page as well to be useful. (Otherwise, I would not bother.)
We now have generalized templates for filtering and sorting, so every page I proposed would have both of those, with no page-specific logic required.
I think that's far too many columns to fit across a single page.
That is why we need the ability to select which columns to display. (But with a very wide screen and wrapping inside of given cell, you could show all of these fields.)
Realistically, I see two main use cases here:
There are way more than just these two use cases. You also want to filter based on:
- All build commands under a given subdirectory (as determined by filters for the path of the
outputsfiles) (e.g. all of the targets underpackages/teuchos/orpackages/panzer/, for Trilinos, for example). - All of the C++ object builds vs. C or Fortran (as determined by filters on "Language").
- All the build commands that take longer than x seconds (as determined by filters on "Duration")
- All of the libraries larger than x GB (as determined by filters on "Output Sizes" and "Role") (e.g. the dreaded 4GB lib limit with Intel compilers)
- All build commands that are not for tests or are only for tests (as determined by filters on "Labels" where the CMake project has labeled all test targets with
TEST). - What commands are involved when the "Host Memory Usage" was larger than x GB (as determined by filters on "Host Memory Used Before" and "Host Memory Used After")
I could keep going ...
Pages which join multiple records together are often less performant, in addition to being harder to implement filters for since there isn't a one-to-one mapping between columns in the table and fields in a specific type being filtered.
These pages have to be useful enough to justify paying to implement and maintain them. I would be happier with a slower page that was useful compared to multiple faster pages that were not very useful.
We now have generalized templates for filtering and sorting, so every page I proposed would have both of those, with no page-specific logic required.
Yes! Every page that has a table with columns should provide the ability to filter and sort based on those columns (and an API to download that data as JSON using those filters).
And I would love it if every page with a table would provide a way to select what columns to show in the rendered HTML page.
If we do work on this, let's make sure that we focus on the highest value page first (which is the page /builds/<build-id>/commands described above) and see how that goes. If that did not take too long, we can look at other pages. If that first page takes a long time, we need to move on to other features and priorities and call this good.
Actually, it just occurred to me that to be really useful, you would like the ability to query commands and targets across multiple builds, like you can for tests across multiple builds with the queryTests.php page. So, really, the highest value page would be called queryBuildTargets.php and it would work very similar to queryTest.php, except it would need the features:
- Filter based on any (or all) of those fields (including
site,buildname,buildstamp, etc.) - Sort by columns (ascending or descending)
- Allow the user to select any subset of columns to display and query based on (and therefore limit the amount of columns show so it fits on the page).
- Word wrap to fit in the available width of the browser.
- Provide sums of the "Duration" and "Outputs Sizes" columns (for the filtered commands).
- Provide maxes for "Host Memory Used" and "CPU Load Average" columns (for the filtered commands).
That would allow you to view how a single object build or library, for example, changes across different builds with different compilers or even the same build across multiple days and weeks (so we can see when there is a spike in the build time for a given *.o file, for example).
@bartlettroscoe
Actually, it just occurred to me that to be really useful, you would like the ability to query commands and targets across multiple builds, like you can for tests across multiple builds with the queryTests.php page. So, really, the highest value page would be called queryBuildTargets.php and it would work very similar to queryTest.php, except it would need the features:
I agree. As I mentioned in the issue description, page(s) which allow users to see command and target performance over time are probably the most impactful, but also much trickier to get right. Querying nth-child relations directly requires the database to iterate through a large number of intermediate child relationships (e.g, searching for targets by project requires the database to look at many build rows to find targets which are associated with a given project before it can begin filtering on target fields). These pages will likely require new database work, requiring more time to implement.
There are way more than just these two use cases. You also want to filter based on:
All of these examples ultimately fall into one of the two categories I listed though...
- All build commands under a given subdirectory (as determined by filters for the path of the
outputsfiles) (e.g. all of the targets underpackages/teuchos/orpackages/panzer/, for Trilinos, for example).
Output files will be attached to build commands, so it falls under(2): users who want to look at commands.
- All of the C++ object builds vs. C or Fortran (as determined by filters on "Language").
Language is a target-specific property, so it falls under (1): users who want to look at targets.
- All the build commands that take longer than x seconds (as determined by filters on "Duration")
Duration is a command-specific property, so it falls under (2): users who want to look at commands.
- All of the libraries larger than x GB (as determined by filters on "Output Sizes" and "Role") (e.g. the dreaded 4GB lib limit with Intel compilers)
In other words, you're looking for targets (of which libraries are a type), filtered by commands. That falls under (1): users who want to look at targets. The fact that you're looking at targets doesn't necessarily mean that you can't filter by commands or measurements, the pages I proposed are just a matter of what type of record you want to look at primarily.
- All build commands that are not for tests or are only for tests (as determined by filters on "Labels" where the CMake project has labeled all test targets with
TEST).
Similar to the last one, you want to look at commands based on their relationships with targets (via which we get their relationships with labels). This falls under (2): users who want to query by command.
- What commands are involved when the "Host Memory Usage" was larger than x GB (as determined by filters on "Host Memory Used Before" and "Host Memory Used After")
Measurements are associated with commands, so this falls under (2): users who want to query by command.
And I would love it if every page with a table would provide a way to select what columns to show in the rendered HTML page.
I agree. That's something which can be added to the common table template pretty easily. I think that should be a separate task though.
If we do work on this, let's make sure that we focus on the highest value page first (which is the page /builds/
/commands described https://github.com/Kitware/CDash/issues/2827#issuecomment-2825766726) and see how that goes. If that did not take too long, we can look at other pages. If that first page takes a long time, we need to move on to other features and priorities and call this good
I think there's significant overlap between our visions for that page. If you approve, I can start working on the basic parts of the page we both agree on, and can probably get it done in time to be released in CDash 4.0. We can then iterate on it further in future releases. The basic columns I think we both agree on are:
- command (truncated to fit if needed)
- type
- duration
- result
- language
- config
- additional columns for each type of measurement
Expanding each table row would reveal the full command, source, and working directory, plus other more detailed command information.
These pages will likely require new database work, requiring more time to implement.
I agree. Given that we have the basic functionality almost in place (we are missing outputs and outputSizes) and there are tests to protect it, I think we can come back to this later (hopefully in FY26).
The conversations in this GitHub Issue provide a good summary of the requirements, implementation options, and issues.
@williamjallen Upon reviewing https://www.kitware.com/cdash-now-supports-cmake-build-instrumentation/, has any of Kitware's main open-source projects begun submitting build instrumentation data to their own CDash dashboards? or have plans to?
I was briefly reviewing the following projects and didn't see instrumentation submissions. It would be nice to see real world use cases and how developers have addressed and improved the build system based on the feature.
- https://open.cdash.org/index.php?project=CDash
- https://open.cdash.org/index.php?project=CMake
- https://open.cdash.org/index.php?project=VTK
- https://open.cdash.org/index.php?project=ParaView
Hi @jamesobutler, there's currently work underway to add a nightly instrumentation job to CMake's CI. Other projects will likely have instrumentation jobs in the near future as well.