oso icon indicating copy to clipboard operation
oso copied to clipboard

Create a training set of labeled repos for DDP

Open ccerv1 opened this issue 2 months ago • 1 comments

We'll iterate on these with DDP, but some initial ideas:

  • "abandoned" - project was started and quickly abandoned
  • "duplicate" - project is double-counted or a fork that doesn't not deviate from the main project
  • "false positive" - project does not seem like it belongs in the dataset
  • "spammy" - project has a lot of bot-like activity or other signs of manufactured activity
  • "high quality missing from OSO" - high quality projects that are missing from OSSD

ccerv1 avatar Oct 28 '25 15:10 ccerv1

OSO-1218

linear[bot] avatar Oct 28 '25 15:10 linear[bot]