codechecker icon indicating copy to clipboard operation
codechecker copied to clipboard

Metrics View in CodeChecker

Open dkrupp opened this issue 5 months ago • 3 comments

CodeChecker will have a metrics view, where different scalar metrics can be stored per run and run history event (storage event) for each file and directory.

Metrics stored per run per run history event In this view a single run and storage event must be selected. the default is the latest storage event. The metrics of of the selected storage version is shown for each file and directory stored in that run. The metrics should be stored for each run history event.

Initially the feature should support at least the following metrics:

  • Lines of Code
  • Effective Lines of Code
  • Number of outstanding reports per severity level (At the time of storing the results, as the report count may change with setting some of the reports resolved. In such case a new storage must be performed.)
  • Report density Number of outstanding reports / effective lines of code

Optionally the following metrics may be stored

  • Cyclomatic complexity (per file)
  • Number of duplicated lines of code
  • Number of unresolved SEI cert violations

Navigation A directory level navigation should be provided where the metrics are displayed for each file and direcory. When the user clicks a file, the source code view is shown for that specific file and version. When the user clicks a directory, the navigation enters into that folder.

Calculation of Metrics There can be 2 types of metrics: a) client calculated metrics (such as test coverage) b) server calculated metrics (such as number of outstanding reports)

The metrics are only calculated at the store event and are read only (cannot change after store). New metrics value can be stored with a new store.

Aggregation

The metrics should be stored per file, but should possible to be aggregated on the directory level.

The directory level aggregation summary must be supported.

Visualization & Gating (Future work) The implementation must allow the visualization of trend lines (by time) for each metric and allow the future implementation of gating on the last change of the metric, or the current status of metric.

Open questions

  1. For which files we need to calculate the LOC metrics ? - All text files?
  2. The content of which files need to be stored? - All text files / only the files where we have reports?
  3. Which versions of the source code files need to be stored? - Only the latest version in the run?

Implementation

Client calculated metrics should be uploaded to the server in YAML files along in the store procedure. Exact YAML format to be defined. Server calculated metrics should be stored at end of the store procedure.

Database Schema Metrics Tabe:

Metrics ID | File ID | Run History ID | Metrics Value | Metrics Type

Metrics_Type Table

Metrics Type ID | Metrics Name | Metrics measure

dkrupp avatar Jul 22 '25 11:07 dkrupp

The ticket can be divided into the following tasks:

  1. Implementing new converter type for metrics
  • Generate file-level aggregated metric results on the client side by an external tool or using the parse command
  • For each new metric, a converter needs to be implemented under codechecker/tools/metric_converters/
  • These converters need to parse output of external tools like scc or use the parse command to calculate report count
  • Parse results into YAML format within results/metrics folder
  • Design and handle possible input parameters (e.g. for filtering files)
  • Write tests in the same style as for the report converters
  • (?) Modify the parse command if we want to display the metric analyzer results
  1. Database schema changes
  • Create new DB models for the additional tables.
  • Design and implement the necessary relations.
  • Write migration upgrade scripts so the new tables are created automatically
  1. Store process modifications
  • Package the metric YAML results from the metrics dir as plist files
  • On the server side, read and parse the metric YAMLs into the proper tables for every run tag
  • Optimize if the store process becomes significantly slower
  • (?) Save file content for every file where metrics were computed.
  1. Server-side metric recalculation
  • Handle report modifications that affect metric statistics and trigger recalculation of the affected metrics
  1. Create thrift API endpoint
  • Introduce new endpoints for the directory-based metric view
    • increase API version
    • modify thrift files,
    • regenerate stubs
    • packaging
  • Implement endpoint functions within report_server.py and write optimized SQLAlchemy queries
    • Input parameters: run_history_name(or id), metric_type, path (optional)
    • Wanted output: metric value list for all elements of the path given
  • Write tests for the new endpoints.
  1. New metric view in the GUI
  • Create a new main page for metrics
  • Develop the necessary components
  • Implement directory-based visualization
  • Optionally extend with informative charts/graphs

Since this is a complex and large-scale project, we assume that our metric converter is already in place and the yaml files are located in the reports/metric directory in the first phase.

cservakt avatar Sep 03 '25 16:09 cservakt

Here is the new API endpoint structure:

Image Image

cservakt avatar Sep 03 '25 16:09 cservakt

The metric related table structure will look like this:

Image

cservakt avatar Sep 04 '25 08:09 cservakt