qaboard
qaboard copied to clipboard
Non-scalar metrics: categories, sequences, scatterplots etc
I think it may be really cool if we had Scatterplot metrics (were the variables consists of the numerical metrics and maybe also some numerical inputs).
It touches a really important point: scalar metrics are not good enough.
There are a few use cases that people have asked a solution for:
- metrics per categories (e.g. metrics per image channel, all/red/green/blue)
- sequential metrics (e.g. SNR over multiple frames)
- ..and yes, scatterplots can be useful!
What makes it hard to do right is the huge number of use cases and ways users expect the aggregation to be made. For instance some want sequential metrics, per regions of interest, and for the aggregation want the last frame only... I am unsure about what a generic API should look like, and the status quo is decent.
Right now, users generate plots to show this data, e.g. plotly graphs, as visualizations. But better support would mean showing the metrics "as they should be" in the table views or in the output cards.
Related: I also want to enable more "dynamic" metrics definitions in metrics.yaml
.
In short, let's continue the discussion, I am looking forward to seeing how you'd like the API to be.
Thanks. IMO we need two abstraction levels of metrics here; first, the one we have now, that enables numeric metrics per single run, and secondly, metrics of analyzing many metrics from the first type from many runs. The second metric type may include numeric metrics (avg, max, etc) or visual graphs as scatterplots, histograms etc. I think first step is to create such an abstraction level and a simple API to the relevant data (that consists of many results of many single runs).
I really like the idea of aggregating metric from different runs (not exclusive to having more complex metrics!) into custom visualizations.
I thought once about offering users some config like outpus.aggregations: [viz1, viz2]
, that would work seamlessly with the app's filtering. I am not sure about how users would specify it though... Maybe we could use some magic on top of plotly specs, where x: $rmse.$key, y: $rmse.$value
would work with a metrics like rmse: {blue: 1, red: 2..}
... For simple cases I figure it would work well. But it would get complex very fast. I'm open to suggestions and help write some code to implement it!
Another direction could be giving users a hook to create themselves the aggregated vizs for a while batch after it finishes, like batch_postprocess(runs, output_dir)
. If they use e.g. plotly, filtering would require users to somehow annotate the data to map data points back to the runs that they are related to...
Using different output types (single/aggregated) could work too. The ergonomics would be more complex but we could to something with the WIP pipeline feature...
Again I'm very open to ideas, especially concrete API ideas. My main preoccupation is finding something that works and is easy to use and document.
By the way, some aspects overlaps a bit with "pipelines". Here is a WIP spec, feel free to discuss it here - I may open an separate issue when we start implementing it https://samsung.github.io/qaboard/docs/dag-pipelines