oso icon indicating copy to clipboard operation
oso copied to clipboard

Repository clustering analysis for EF

Open ccerv1 opened this issue 8 months ago • 3 comments

What is it?

Context

We built an initial Hex dashboard that identifies repos in the Ethereum ecosystem

We received the following feedback:

I was wondering if you already had thoughts on how to cluster repos in some unsupervised manner and then label them, or if you think we need to start with categories and find the repos that belong in them.

For the former, especially if you are using dependency graphs from some set of 'root repos' (like you did with deep funding?), I've seen these 'dependency structure matrices' which may be of some help: https://docs.lattix.com/lattix/modelingComplexSystems/ModelingComplexSystems.html

If we are starting with a set of repos I thought it would be good to discuss what a good set of categories would be, and how we would manage the list going into the future.

Next Steps

We should do the following:

  • Experiment with using pyoso in Colab with Gemini AI
  • Implement a variety of clustering approaches and share results with EF
  • Create a tutorial for others to use pyoso in Colab for ML explorations

ccerv1 avatar May 06 '25 17:05 ccerv1

Notebook: https://colab.research.google.com/drive/1GbesuJLalkTUCHQlHuRQFyGjvvfWql5h?authuser=2

evanameyer1 avatar May 16 '25 01:05 evanameyer1

I still have to:

  1. clean up my local code and push it to insights
  2. Formalize a conclusion

evanameyer1 avatar May 16 '25 01:05 evanameyer1

Per conversation w/ Carl, I'm going to create an initial version of this: https://docs.lattix.com/lattix/modelingComplexSystems/ModelingComplexSystems.html, using the Devtooling categories

evanameyer1 avatar May 27 '25 01:05 evanameyer1

PR (clustering repos by DSM): https://github.com/opensource-observer/insights/pull/174

evanameyer1 avatar May 28 '25 17:05 evanameyer1

Per call yesterday:

  • Want something more like the AI categorizations that we did for Optimism
  • Main pain point is understanding trends for the top X builders across each category

ccerv1 avatar Jun 05 '25 13:06 ccerv1