oso What are the most important tutorials that should exist on our docs? Where should the rest of the tutorials go?

What is it?

Do some research and propose a list of data science tutorials using pyoso of some of the most common/sought out data techniques. Focus on being concise, interesting, and exhaustive, targeting users that are new to data science and looking to get started.

May 22 '25 17:05 evanameyer1

OSO-762 What are the most important tutorials that should exist on our docs? Where should the rest of the tutorials go?

May 22 '25 17:05 linear[bot]

Per today's conversation, we should think about refactoring the tutorials page. The main idea involves defining the "core" or "most important" topics that will remain the tutorials shown on the oso docs page, and then directing everything else to the Colab community/insights repo.

I put together a PR where I refactored the tutorial page a little: https://github.com/opensource-observer/oso/pull/3888

When brainstorming what the "core" topics should be, I realized that it really depends on target user, so I created 4 light personas and gave each 3 "core" topics. Let me know what you think @ccerv1 @ryscheng

May 22 '25 22:05 evanameyer1

These changes should work nicely with the github action workflows I've pushed, as we move people towards adding to the colab community and insights repo.

May 22 '25 22:05 evanameyer1

Some fun ideas for tutorials to make down the line (these will exist in the Colab community) - I'll just save them here for now:

Clustering (I can use the work I did for EF)
Repo categorization w/ LLM
Survival Analysis of OSS Projects
Agent built on your repo (as a knowledge base) that you can chat with
RL-Based Grant Allocation Simulator
Sentiment Analysis (Issue Discussions, Farcaster, X, etc)

May 22 '25 22:05 evanameyer1

This is a good list!

Intro:

Seeing how much funding a project has received from different sources
Looking up projects that meet some set of heuristics (eg, all projects with 1-10 devs, deployments on X chains, etc)

Medium:

Doing a basic synthetic control or statistical test for timeseries metrics
Market share calculations
Extracting data from some of our staging models (eg, more information about specific commits or dependency versions)

Advanced:

Combining OSO data with AI jobs
Creating a retro funding distribution algorithm

May 24 '25 13:05 ccerv1