core icon indicating copy to clipboard operation
core copied to clipboard

[Spike] [MVP] Package maintenance predictive model

Open mayaCostantini opened this issue 2 years ago • 2 comments

Problem statement

While most approaches focus on guaranteeing the provenance of software components, this is only one side of sustainable software development. One other side is the focus on software components which are critical to the success of the whole software system, its development and delivery/operation.

cc @goern

As Python developer, I would like to be able to predict if some of my dependencies will go unmaintained with time.

The idea would be to develop a learning model able when a given package will go under an acceptable level of maintenance that could be defined by the user or directly in the model, in an arbitrary way. A PoC for this model could use project maintenance data as provided by the OpenSSF Security Scorecards, given that the upstream project implements Scorecard checks per package version instead of updating Scorecards check given the project repository last commit SHA.

Proposal description

  1. Provide a PoC of a model trained on the Scorecards dataset (with Scorecard checks per package version) capable to predict from which version a package is susceptible to go under a predefined level of maintenance. A good candidate for this task could be a Multiple Linear Regression, given that MLR assumptions (linear relationship between predictive and response variables, predictive variables are not too correlated, etc) are validated. Other supervised learning models could also be considered.
  • [ ] Select features for prediction according to the model chosen
  • [ ] Aggregate and process data for training
  • [ ] Train and validate the model, and examine coherence of the results
  • [ ] Experiment with different models and document a benchmark
  1. Find relevant integrations for the model

Think about ways to provide this model as a service, and where in a Python project lifecycle it would be most relevant for developers to predict the maintenance duration of their dependencies.

Acceptance Criteria

To be defined.

mayaCostantini avatar Aug 04 '22 18:08 mayaCostantini

@mayaCostantini: This issue is currently awaiting triage. If a refinement session determines this is a relevant issue, it will accept the issue by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sesheta avatar Aug 04 '22 18:08 sesheta

/priority important-longterm /sig stack-guidance

mayaCostantini avatar Aug 04 '22 18:08 mayaCostantini