riskassessment
riskassessment copied to clipboard
Align `riskmetric` pkg repo source(s)
I noticed for packages recently updated on CRAN, all the riskmetric
metrics were borked:
After a little bit of digging, I found that riskmetric::pkg_ref()
is smart enough to look up our environment's repo source, which so happens to be pointing to a CRAN snapshot for renv.lock
purposes:
As a result, the pkg info we scrape by hand from CRAN clashes with the info that riskmetric
provides, and displays faulty metric & pkg scores. However, there is a way to point riskmetric::pkg_ref()
at a different source using the repos
arg:
And that fixes the problem for this scenario:
However, I'm not sure we can specify all the repo source contexts that exist... For example, Bioconductor, GitHub, etc
Any ideas?
I think this raises some larger questions about the application workflow and how riskmetric
is being used within the application.
In general, riskassessment
seems very CRAN-focused. riskmetric
does not have this same CRAN-first lens. This can lead to confusion like the bug recently opened in #412.
A short-term solution that aligns with riskassessment
s CRAN-first lens would be to update the two "bare" calls to riskassessment::pkg_ref
to riskmetric::pkg_ref(..., source = "pkg_cran_remote")
. This will return pkg_ref
objects that point to the latest CRAN versions (like riskassessment
currently wants to do).
This does not directly solve the issue you raise above. That is due to riskmetric
respecting the repos
option and riskassessment
ignoring it and referencing two different CRAN mirrors directly (here and here).
I would propose that the remote CRAN repository used in all of these places (calls to riskmetric
and direct uses within riskassessment
) be updated to hit the same source. This could be configured through a golem option with a default to getOption("repos")
.
I also think it is worth discussing how non-CRAN or older CRAN sources could be supported within the application as end users will likely be wanting to assess packages/versions other than latest CRAN versions.
A short-term solution that aligns with
riskassessment
s CRAN-first lens would be to update the two "bare" calls toriskassessment::pkg_ref
toriskmetric::pkg_ref(..., source = "pkg_cran_remote")
. This will returnpkg_ref
objects that point to the latest CRAN versions (likeriskassessment
currently wants to do).
I appreciate the short term solution, as it's "crunch time" this week, prepping for the minor release for shiny conf 2023. I vaguely remember the decision (years ago) for app to leverage a CRAN-first lens, but it was just because we wanted to "get stuff working" at the time, and we planned to tackle other pkg sources at a later date... aka, now. 😄 I'll test this quick fix out and if successful, push a new PR.
I would propose that the remote CRAN repository used in all of these places (calls to riskmetric and direct uses within riskassessment) be updated to hit the same source. This could be configured through a golem option with a default to getOption("repos").
Agreed 👍🏼. Would you be willing to get this PR started?
I also think it is worth discussing how non-CRAN or older CRAN sources could be supported within the application as end users will likely be wanting to assess packages/versions other than latest CRAN versions.
Agreed 👍🏼. It's time! Perhaps we can focus having a strategy by our next dev meetup (2/28)?
[temporary fix discussion...]
This does not directly solve the issue you raise above. That is due to
riskmetric
respecting therepos
option andriskassessment
ignoring it and referencing two different CRAN mirrors directly (here and here).I would propose that the remote CRAN repository used in all of these places (calls to
riskmetric
and direct uses withinriskassessment
) be updated to hit the same source. This could be configured through a golem option with a default togetOption("repos")
.
@Jeff-Thompson12, do you agree that this issue still exists? I think Andrew's proposal to sync repos using a golem option is a good one, and potentially "low hanging fruit", no?
At the same time, I'd love to include the bioconductor repo... but that can be pushed into another PR to consider how we collect information in the rest of the app... since we don't do a good job of supporting Bioconductor right now. Though I know the new tarball effor does help. I raised a new issuefor that enhancement: #594.
Yes, this is definitely still an issue. Maybe we should schedule a meeting to discuss this issue more in depth. We have several different pieces affecting/being affected by this issue. Our community usage metrics are filled completely ignoring the supplied repo. And if we want to move away from storing tarballs to downloading them when required, the structure of the supplied repo's archive is going to become very important as well.
@AARON-CLARK what issue are we trying to solve by allowing users to specify their desired repository?
Hmmm, I think it could help users identify internal repos that we are not privvy to in the open source world.
Well riskmetric::pkg_cran()
is meant to point at a CRAN repository. The package {cranlogs}
points specifically to the RStudio CRAN mirror. It makes since to add support for Bioconductor. We could use a package like {bioC.logs}
for the community usage metrics. I think maybe a better approach is to more prominently tell the users the mirrors/repositories we are actually pointing at and make sure we are being consistent in the app. I do think there is one place where we are scraping information from a different mirror.
Hmmm, I think it could help users identify internal repos that we are not privy to in the open source world.
I think my main question would be is {riskmetric}
meant to be used in this way? For instance, we are pointing to a snapshot on PPM. Look at the comparison on if I point to that repo compared to the RStudio mirror.
dplyr_ref1 <- riskmetric:::pkg_cran("dplyr")
dplyr_ref2 <- riskmetric:::pkg_cran("dplyr", repos = "https://cran.rstudio.com")
dplyr_ref1$version # version number from the snapshot
# "1.1.0"
dplyr_ref2$version # current version number
# "1.1.2"
# Number of downloads are the same. There seems to be an inherent assumption that you are looking at the most recent version of a package when creating a risk object.
all.equal(dplyr_ref1$downloads, dplyr_ref2$downloads)
# TRUE