precommit Add parsable-roxygen hook

trafficstars

As discussed in #562.

Things to note:

Since roxygen2::parse_file() does actually parse the file, I think we have to allow this hook to fail in two ways:
- If it catches a warning, it fails and explains that it's due to a roxygen error
- If it catches an error, it fails in the same way as parsable-R
I don't think we need to actually evaluate the code though, hence roxygen2::parse_file(..., env = NULL) - unless you can think of some reason why we would need to evaluate?
Do we need to do anything to ensure that {roxygen2} is available?

Closes #562.

May 09 '24 15:05 owenjonesuob

Sorry @owenjonesuob for the delay. Here's my review:

If it catches a warning, it fails and explains that it's due to a roxygen error.

Can you tell me when a warning would be issued? Depending on that, maybe a flag --allow-warnings could govern the behaviour of whether or not a warning is elevated to an error. If unset, warnings are errors, as you suggest.

I don't think we need to actually evaluate the code though,

What speaks against it? Is it slower? I guess it requires the package to be loadable with pkgload::load_all()? This would imply that development dependencies need to be available. Also, parsing errors in code to be evaluated does not seem to get caught with roxygen2::parse_file('file.R') if I write this to file.R:

#' Initiate a pre-commit config file
#'
#' @param verbose way. `r 1+++++`
my_fun <- function(...) NULL

But roxygen2::roxygenize() afterwards catches the problem and a subsequent roxygen2::parse_file('file.R') as well. Not sure how we can catch this in the first place...

May 18 '24 20:05 lorenzwalthert

A big "sorry" from me too, for leaving this unanswered for so long - it's been a busy few weeks!

Can you tell me when a warning would be issued?

Apologies, I should have explained this better.

When we call roxygen2::parse_file(), it immediately uses roxygen2:::tokenize_file(), which:

Parses the file with base::parse()
Constructs roxygen blocks

If there is a problem with the parsing, we will get an error in Step 1, before we even look at the roxygen comments! Hence the need to capture such an error, in the same way as parsable-R.

Then assuming we make it to Step 2: if there are any problems with roxygen comments, these are reported via roxygen2:::warn_roxy(). That function uses cli::cli_inform() to communicate to the user.

In other words, messages - not warnings, that was a mistake from me! - indicate some problem with the roxygen comments. I've fixed that logic in 4043ada.

I don't think we need to actually evaluate the code though,

What speaks against it? Is it slower? I guess it requires the package to be loadable with pkgload::load_all()? This would imply that development dependencies need to be available.

Yep I think those were the drawbacks I had identified (but not communicated, sorry!) - especially the requirement for dependencies to be available, in certain cases. I'll explain one of those cases in a moment...

The default in roxygen2::parse_file() is env = env_file(file), which uses sys.source() to evaluate each expression in the file in a specially-created environment.

If the file contains only "object declarations" - e.g. of functions or data objects, like in a R/*.R file in an package - then this isn't likely to cause any problems. However, we might run into problems if that's not the case!

For a concrete example, consider the Appsilon/rhino-showcase Shiny app. It's built using the Rhino framework, which makes heavy use of box modules (see their docs). There is plenty of functionality which might benefit from roxygen documentation (not that it has any right now!) - see for example https://github.com/Appsilon/rhino-showcase/blob/ae366f522d3adf0571fb7c45aaa8e8d29d0937e0/app/logic/update_map.R

But box modules often begin with one or more box::use() expressions, to import functionality from other modules or packages. So if we use env = env_file(file) here, we need all of those modules/packages to be available - plus any modules/packages used by those imported module, plus ..., plus ........ so we can end up with an enormous dependency chain, for a file which other than those dependency declarations is just declaring functions!

I wonder whether we could have the best of both worlds, if we expose an eval argument for the parsable-roxygen hook?

eval: true (default?) - parse the file with env = env_file(file), inline R code in roxygen commentary is evaluated, this is fine in most cases e.g. for typical object-declaration-only scripts within an R package
eval: false - parse the file with env = NULL, avoid dependency requirements, with the caveat that inline code won't be evaluated (as explained in the roxygen2 docs)

I can implement something like that if you think it's a suitable idea!

Jun 21 '24 12:06 owenjonesuob

I've pushed a proof-of-concept for a --no-eval flag, which would suppress evaluation as suggested in my previous comment.

Re this comment in available-hooks.Rmd:

Inline R code within roxygen comments (i.e. within backticks) is not evaluated by this hook, whether or not --no-eval is specified.

As far as I can tell, this is how those inline chunks get evaluated: https://github.com/r-lib/roxygen2/blob/c03a6711eea3f7dc16e55f0ab0babac9ff40d9ea/R/markdown.R#L134-L140

That's called at the point where chunks are actually being processed; so I'd suggest checking the content of inline code is out of scope for this hook?

Jun 21 '24 14:06 owenjonesuob

Thanks for all your comments. I added some suggestions to the PR, please have a look. I am preparing a new CRAN release but I suggest to not include this hook there, as I am not clear on the --no-eval flag and the language: r just yet. I think it makes sense when you experiment with it a little first? You can already use the hook, just put the commit hash of this PR tip in the rev field of your .pre-commit-hook.yaml. My thoughts are:

if this is primarily aimed to be a local hook, I am not sure you want language: r since per default (i.e. in absence of --no-eval), you would have to add all your packages dependencies as additional_dependencies: of the hook. This is already the case for the roxygenise hook and made things difficult for people I think, because 1) the person who adds support for that hook in the repo has to understand the semantics of the .pre-commit-config and 2), bootstrapping a whole renv might be an overkill for the purpose of parsing files (dependency versions probably don't matter as much here as for example for the styler hook). On the other hand, language: r works better on CI (e.g. pre-commit.ci or also GitHub Actions) because the user does not have to manage the R dependencies.
If it's' not primarily a local hook, language: r seems better, but then we could still default to --no-eval to avoid specifying dependencies in .pre-commit-config.yaml by default.

Bottom line, probably we want language: script (i.e. not a separate renv for the hooks, just use the global R library) since the execution environment is locally (and not as a CI check) and the requiring the dependencies is fine, since they are already installed globally on the local machine.

Jul 20 '24 15:07 lorenzwalthert

Many thanks for your detailed thoughts and suggestions!

After some consideration, for now I have opted for a variation of your second proposal above:

Keep language: r - partly because this hook feels very "conceptually close" to parsable-R, which uses language: r, and it felt strange for them to work differently! And partly because I can see this being useful on CI for my own projects (described previously).
Avoid evaluating code, by default - so now we have an optional --eval flag, which can be specified if evaluation is desired (i.e. the complement of what we had before!). I think the only time a user would need to use this would be to evaluate @eval tags in roxygen comments, so I've made that clearer in the hook documentation!

Jul 22 '24 15:07 owenjonesuob

Ok, works for me too. To be honest, I don't think @eval is used that often anyways. We could also make pre-commit catch more problems related to roxygen code comments without access to the dependencies, e.g. by extending existing hooks:

deps-in-desc to check roxygen example code for dependencies.
other existing hooks?

Jul 24 '24 11:07 lorenzwalthert

FWIW you can always override language: (and other fields) in your .pre-commit-config.yaml.

Jul 24 '24 11:07 lorenzwalthert

Can you please fix the one test in R cmd check that fails, then I can merge this...

Jul 24 '24 16:07 lorenzwalthert

Oops sorry :facepalm: tests have now been adjusted to match the updated functionality!

I happen to be working on a different computer today, with an older version of roxygen2, which helped me to catch one other problem... quoting myself from a few comments ago:

In other words, messages - not warnings, that was a mistake from me! - indicate some problem with the roxygen comments.

In fact, problems were reported via warnings (not messages!) prior to roxygen2 v7.3.0 (changed by r-lib/roxygen2#1548, I think). So I've added a minimum version requirement for roxygen2 >= 7.3.0, similar to the ones used in a couple of other hooks!

Jul 25 '24 08:07 owenjonesuob

Thanks, since we use language: r, honouring 0.7.3 can be guaranteed.

Jul 25 '24 12:07 lorenzwalthert

Thanks again for the collaboration. I added an issue for extending deps-in-desc to check roxygen code examples as well: https://github.com/lorenzwalthert/precommit/issues/585. Feel free to contribute.

Jul 25 '24 12:07 lorenzwalthert

precommit precommit copied to clipboard

Add parsable-roxygen hook

precommit
precommit copied to clipboard