riskassessment icon indicating copy to clipboard operation
riskassessment copied to clipboard

Align `riskmetric` pkg repo source(s)

Open AARON-CLARK opened this issue 2 years ago • 7 comments

I noticed for packages recently updated on CRAN, all the riskmetric metrics were borked:

image

After a little bit of digging, I found that riskmetric::pkg_ref() is smart enough to look up our environment's repo source, which so happens to be pointing to a CRAN snapshot for renv.lock purposes:

image

As a result, the pkg info we scrape by hand from CRAN clashes with the info that riskmetric provides, and displays faulty metric & pkg scores. However, there is a way to point riskmetric::pkg_ref() at a different source using the repos arg:

image

And that fixes the problem for this scenario:

image

However, I'm not sure we can specify all the repo source contexts that exist... For example, Bioconductor, GitHub, etc

image

Any ideas?

AARON-CLARK avatar Feb 13 '23 17:02 AARON-CLARK

I think this raises some larger questions about the application workflow and how riskmetric is being used within the application.

In general, riskassessment seems very CRAN-focused. riskmetric does not have this same CRAN-first lens. This can lead to confusion like the bug recently opened in #412.

A short-term solution that aligns with riskassessments CRAN-first lens would be to update the two "bare" calls to riskassessment::pkg_ref to riskmetric::pkg_ref(..., source = "pkg_cran_remote"). This will return pkg_ref objects that point to the latest CRAN versions (like riskassessment currently wants to do).

This does not directly solve the issue you raise above. That is due to riskmetric respecting the repos option and riskassessment ignoring it and referencing two different CRAN mirrors directly (here and here).

I would propose that the remote CRAN repository used in all of these places (calls to riskmetric and direct uses within riskassessment) be updated to hit the same source. This could be configured through a golem option with a default to getOption("repos").

I also think it is worth discussing how non-CRAN or older CRAN sources could be supported within the application as end users will likely be wanting to assess packages/versions other than latest CRAN versions.

borgmaan avatar Feb 13 '23 20:02 borgmaan

A short-term solution that aligns with riskassessments CRAN-first lens would be to update the two "bare" calls to riskassessment::pkg_ref to riskmetric::pkg_ref(..., source = "pkg_cran_remote"). This will return pkg_ref objects that point to the latest CRAN versions (like riskassessment currently wants to do).

I appreciate the short term solution, as it's "crunch time" this week, prepping for the minor release for shiny conf 2023. I vaguely remember the decision (years ago) for app to leverage a CRAN-first lens, but it was just because we wanted to "get stuff working" at the time, and we planned to tackle other pkg sources at a later date... aka, now. 😄 I'll test this quick fix out and if successful, push a new PR.

I would propose that the remote CRAN repository used in all of these places (calls to riskmetric and direct uses within riskassessment) be updated to hit the same source. This could be configured through a golem option with a default to getOption("repos").

Agreed 👍🏼. Would you be willing to get this PR started?

I also think it is worth discussing how non-CRAN or older CRAN sources could be supported within the application as end users will likely be wanting to assess packages/versions other than latest CRAN versions.

Agreed 👍🏼. It's time! Perhaps we can focus having a strategy by our next dev meetup (2/28)?

AARON-CLARK avatar Feb 14 '23 12:02 AARON-CLARK

[temporary fix discussion...]

This does not directly solve the issue you raise above. That is due to riskmetric respecting the repos option and riskassessment ignoring it and referencing two different CRAN mirrors directly (here and here).

I would propose that the remote CRAN repository used in all of these places (calls to riskmetric and direct uses within riskassessment) be updated to hit the same source. This could be configured through a golem option with a default to getOption("repos").

@Jeff-Thompson12, do you agree that this issue still exists? I think Andrew's proposal to sync repos using a golem option is a good one, and potentially "low hanging fruit", no?

At the same time, I'd love to include the bioconductor repo... but that can be pushed into another PR to consider how we collect information in the rest of the app... since we don't do a good job of supporting Bioconductor right now. Though I know the new tarball effor does help. I raised a new issuefor that enhancement: #594.

AARON-CLARK avatar Jul 24 '23 12:07 AARON-CLARK

Yes, this is definitely still an issue. Maybe we should schedule a meeting to discuss this issue more in depth. We have several different pieces affecting/being affected by this issue. Our community usage metrics are filled completely ignoring the supplied repo. And if we want to move away from storing tarballs to downloading them when required, the structure of the supplied repo's archive is going to become very important as well.

Jeff-Thompson12 avatar Jul 24 '23 12:07 Jeff-Thompson12

@AARON-CLARK what issue are we trying to solve by allowing users to specify their desired repository?

Jeff-Thompson12 avatar Jul 28 '23 13:07 Jeff-Thompson12

Hmmm, I think it could help users identify internal repos that we are not privvy to in the open source world.

AARON-CLARK avatar Jul 28 '23 15:07 AARON-CLARK

Well riskmetric::pkg_cran() is meant to point at a CRAN repository. The package {cranlogs} points specifically to the RStudio CRAN mirror. It makes since to add support for Bioconductor. We could use a package like {bioC.logs} for the community usage metrics. I think maybe a better approach is to more prominently tell the users the mirrors/repositories we are actually pointing at and make sure we are being consistent in the app. I do think there is one place where we are scraping information from a different mirror.

Hmmm, I think it could help users identify internal repos that we are not privy to in the open source world.

I think my main question would be is {riskmetric} meant to be used in this way? For instance, we are pointing to a snapshot on PPM. Look at the comparison on if I point to that repo compared to the RStudio mirror.

dplyr_ref1 <- riskmetric:::pkg_cran("dplyr")
dplyr_ref2 <- riskmetric:::pkg_cran("dplyr", repos = "https://cran.rstudio.com")

dplyr_ref1$version # version number from the snapshot
# "1.1.0"
dplyr_ref2$version # current version number
# "1.1.2"

# Number of downloads are the same. There seems to be an inherent assumption that you are looking at the most recent version of a package when creating a risk object.
all.equal(dplyr_ref1$downloads, dplyr_ref2$downloads)
# TRUE

Jeff-Thompson12 avatar Jul 28 '23 16:07 Jeff-Thompson12