oso icon indicating copy to clipboard operation
oso copied to clipboard

OP: High level search on dependencies of past RF projects

Open ccerv1 opened this issue 1 year ago • 3 comments

What would you like to see? Please provide a clear and concise description of what you want

From Jonas:

What would be the lift to put together a v2 of the "downstream impact" model for packages that you put together a while back? Link here. Just having up to date data with all the new Superchain projects that are in OSO would be great. This could give a good basis to reason about parts of the RF Dev tooling round

Is there any additional context you can share? Feel free to include notes or screenshots that can help us research and track the issue.

See also #31

Is there a write-up or doc associated with this issue? Please link to it.

No response

ccerv1 avatar Sep 30 '24 16:09 ccerv1

OK, this is super cool! And it's working decently for NPM packages at least.

See results here: https://docs.google.com/spreadsheets/d/1iLETROqdcGtZMBzGB2CwDnvJvZo_RP85GNDF6kFVeDo/edit?usp=sharing

First, I run a basic query to get the current snapshot of dependencies:

SELECT
  `SnapshotAt` AS `time`,
  `System` as event_source,
  `Name` AS from_artifact_namespace,
  CONCAT(`Name`, "_", `Version`) as from_artifact_name,
  `Dependency`.`Name` AS to_artifact_namespace,
  CONCAT(`Dependency`.`Name`, "_", `Dependency`.`Version`) as to_artifact_name,    
FROM `bigquery-public-data.deps_dev_v1.Dependencies`
WHERE
  `MinimumDepth` = 1  -- direct deps only
  AND TIMESTAMP_TRUNC(`SnapshotAt`, DAY) > TIMESTAMP("2024-09-18") -- arbitrary cutoff date
  AND UPPER(`System`) = 'NPM'

(i save this into a temp table, since it's a huge dataset)

Then I run the following:


with repos as (
  select distinct
    pbc.project_name,
    concat(rm.artifact_namespace, '/', rm.artifact_name) as repo_name
  from `oso.projects_by_collection_v1` pbc
  join `oso.int_repo_metrics_by_project` rm
    on pbc.project_id = rm.project_id
  where pbc.collection_name = 'optimism'
), 
packages as (
  select
    replace(to_artifact_namespace, '@', '') AS dependency,
    replace(from_artifact_namespace, '@', '') AS dependent_package
  from `static_data_sources.deps_dev_playground`
)

select distinct
  repos.project_name,
  packages.dependent_package,
  packages.dependency
from repos
join packages on repos.repo_name = packages.dependent_package
order by project_name

ccerv1 avatar Oct 03 '24 22:10 ccerv1

@ccerv1 on the v0 results, how many onchain projects were included? Anything about the results that suprised you?

JSeiferth avatar Oct 08 '24 09:10 JSeiferth

@Jabolol have a look at my project matching logic above. What would you do differently?

ccerv1 avatar Oct 09 '24 01:10 ccerv1

Closing out again in favor of #2338

ccerv1 avatar Oct 14 '24 20:10 ccerv1