ctv icon indicating copy to clipboard operation
ctv copied to clipboard

CRAN task view proposal: PackageDevelopment

Open SimonGoring opened this issue 10 months ago • 31 comments

Scope

Given the popularity of widely known packages such as usethis, devtools and testthat, a broader task view for R package development would be helpful to introduce developers to a set of tools to make packages cleaner, more efficient and easier to deploy to CRAN. The Task View could link to packages that provide information on test coverage, documentation, API mocking, and benchmarking that would all assist developers in producing higher quality packages more quickly.

Packages

As noted above, packages such as usethis, devtools and testthat are widely known. In addition we suggest packages such as covr, mockery, webfakes and others that would assist developers. Tentatively, groupings might look like this:

  • Package Creation: devtools (omnibus package), usethis (adding specific package infrastructure), available (is the package name available?)
  • Testing: testthat, mockery, covr
  • Code Styling: lintr
  • Documentation: roxygen2

Overlap

At present I see no potential overlap.

Maintainers

Simon Goring -- But this proposal is open at this time and I am happy to provide support without being a maintainer. I am opening the issue to a broader public at this time to see if there is further support.

SimonGoring avatar Aug 24 '23 17:08 SimonGoring

I can suggest a few more:

fusen: Build a Package from Rmarkdown Files BiocManager: for checking packages that go into Bioconductor repository cyclocomp: to check functions complexity

styler: to enforce an automatic style

llrs avatar Aug 24 '23 17:08 llrs

Some more suggestions:

knitr/rmarkdown: for building vignettes vdiffr: for visual testthat tests pkgdown: for building package websites lifecycle: for documenting deprecation changes

willgearty avatar Aug 24 '23 17:08 willgearty

A few more to consider, with their possible classification

wibeasley avatar Aug 24 '23 20:08 wibeasley

There used be a Package Development CTV (via @maelle) from 2018 which has been archived. This might still be helpful re structure and some packages/resources not yet mentioned (and not out-dated).

Further suggestions:

dpprdan avatar Aug 25 '23 10:08 dpprdan

I've just now remembered about https://github.com/IndrajeetPatil/awesome-r-pkgtools maintained by @IndrajeetPatil

maelle avatar Aug 25 '23 13:08 maelle

@SimonGoring : thanks a lot for the proposal! @llrs @wibeasley @willgearty @dpprdan @maelle : thanks for the suggestions. I think that the idea is good so you can go ahead with the writing of a former task view following the instructions.

  • The proposal should include the packages suggested by others that you see fit to the topic.
  • It should also clarify the overlap with ReproducibleResearch (on formatting tools).
  • I wonder if some of the packages listed in the Section "Easier interfaces for Compiled code" of HighPerformanceComputing would not be relevant for this TV and thus if the overlap should not be mentioned as well. @zeileis @eddelbuettel : what do you think?
  • We also encourage maintainer to have co-maintainers for their TV. Maybe some of the persons who have contributed to this discussion would like to help you?

tuxette avatar Aug 25 '23 15:08 tuxette

I would argue strongly that this should not be structured as a standard ctv. It must, definitely must start from WRE, and must track updates to WRE.

One data point: many packages still failing reverse dependency checks in preparation for archiving R-spatial infrastructure packages simply have stale roxygen import markup. That is,making things simpler has a lower threshold that is higher than many appreciate, and promoting packages will not automate that away.

rsbivand avatar Aug 25 '23 15:08 rsbivand

One more:

@rsbivand How should this be structured if not as a normal ctv? Grouping them here would make it easier to use them but it is not an endorsement from CRAN, R or even the maintainers of the ctv. From the README/main page:

CRAN task views aim to provide guidance which packages on CRAN are relevant for tasks related to a certain topic... and they are not meant to endorse the "best" packages for a given task.

I agree that the rule of Keep it simple is golden: I would love a talk or a blog post about how to create a package with vignettes, tests and testing reverse dependencies without using any package. But I think most problems with the packages come from a lack of maintenance, maybe because they don't keep up with tools used to create/maintain it or maybe because R, the other dependencies, and checks evolve. Which of these two reasons is more frequent, would be an interesting question :thinking: .

llrs avatar Aug 25 '23 16:08 llrs

The distinction is that package development is a core component of the R ecosystem, including much R core functionality. CTVs are typically about clusters of statistical or numerical methods, or application areas specific to chosen domains.

Effectively all the information needed for package development is in WRE, and can be buttressed by using example packages of varying degrees of complexity (compiled code, S3, S4, etc.). The key point that is understated is that the current package population is very actively used to develop R itself. R has chosen to track emerging processor architectures, emerging compiler technologies (for C, Fortran and by extension C++), and this dynamic relationship in the whole ecosystem is vital. If packages fail to adapt dynamicallly to subtly changing CRAN requirements, the mutual relationship (R as a service to packages and well-maintained packages as a service to the future well-being of R) gets disturbed.

So a not-a-ctv explaining that if you submit a package to CRAN, you are committing to maintaining it for as long as it takes (in my case well past my retirement). Calling it "sustainable package maintenance" might work, because writing a package is just the visible part of the iceberg.

Beyond that, I may be wrong, but I thought that a golden rule was keep it as simple as possible, but never more than that. WRE may be seen as being hard to read, but is precise, and is current best practice.

Most older actively maintained packages were written without any package support, but with extensive use of R CMD build and R CMD check at the shell prompt. utils::prompt has long been used for initial creation of Rd files. I would argue that learning to use these simple tools leads to more independence and confidence than having to learn a slew of packages, which then get updated, but I understand that their promise of guiding the "novice" is alluring. Perhaps we should recognise that anyone considering writing a package to be contributed to CRAN is no longer a "novice".

By the way, does Bioconductor have package maintainer induction experience?

rsbivand avatar Aug 25 '23 22:08 rsbivand

I think your point is well taken @rsbivand. The WRE is a great resource. I was originally thinking of the CTV as a tool for fine tuning and improving packages that have already been developed. An example would be having someone use code coverage to improve the testing to make their package more robust. Similarly, using some form of code styler to make code in an existing package more readable.

So, maybe Package Development isn’t the best term for the CTV, I like the phrasing “sustainable package maintenance”. If we frame it that way, with clear pointers to the WRE in the introduction, are we then on the same page?

SimonGoring avatar Aug 26 '23 00:08 SimonGoring

Also a huge thanks to @llrs @wibeasley @dpprdan @maelle and @tuxette.

SimonGoring avatar Aug 26 '23 00:08 SimonGoring

@SimonGoring we're getting there, not not there yet. Your example shows that we still have some iterations. In default mode, neither standard unit tests nor code coverage look upstream. The most useful resource for existing CRAN/(Bioconductor?) packages are the per-maintainer CRAN check pages, because changes/deltas and any not OK outcomes need to be examined for causes. For example, a NOTE for big directories is usually innocuous, but the appearance of another stat/change signals an upstream problem, like a new version of a package imported by the package itself. This might be caught by github CI, but github CI might only run on commit/push, not at time intervals, so would only fail when an update was made.

So for a CTV to be useful, we'd need a package or other mechanism to create push notifications for maintainers based on a baseline of their CRAN check results and then deltas on those. I will admit that I haven't looked for anything like that, I simply use my browser bookmarked CRAN maintainer check report page and visit it frequently. With that in place, some of the other things follow, but without those considerations, unit tests and code coverage only provide a false sense of security, I'm afraid.

rsbivand avatar Aug 26 '23 06:08 rsbivand

Most of the packages mentioned so far aim to make it easier to develop or held a higher standard than even CRAN requires: tests, vignettes, webpages, code style, complexity, metadata. I agree this places harder load for maintainers but this is better than not having tests or vignettes for example.

However, as you noted, this often results in many packages being archived, straining "the mutual relationship (R as a service to packages and well-maintained packages as a service to the future well-being of R) gets disturbed.". But this has been accepted as normal and even desiderable by some. [I would said that the definition of well-maintained packages for some people are those that remain on CRAN for long, so it is a self-fulfilling definition.]

I don't understand why a CTV only collecting these packages wouldn't be useful to developers of packages (Whatever its name would be [Package Development Helpers?]). I too wish we would have some other notifications in repositories and a way to collect the data from the checks done by them but so far we do not (and this might be a good long term goal for R Repository working group or similars).
There is, Diffify built on top of CRAN (and not Bioconductor) and I created some delta checks about RNG causing ERRORS or WARNINGS on CRAN (But as I cannot check if all the checks were run on the same R version making it is less useful it could be), creating a feed or emailing developers could be done. But in general this kind of things can be built outside the repository/CRAN team (although a good relationship with them is better :smiley: ).

Bioconductor provides a guide for maintainers which is more specific than CRAN and has a long manual review on the first submission. It also requires maintainers to follow the WRE (but checks on Bioconductor are not the same as those on CRAN). There is also a mentor program for developers, a mailing list (as you know) and a slack to make an easier maintainer induction.

llrs avatar Aug 26 '23 09:08 llrs

@rsbivand, your packages and books are incredible, so I feel silly sharing my opinion about software development in this thread. However I believe that mortal developers can gain value from something like a CTV, even after carefully rereading WRE, books, and blogs. Here are a few packages that come to mind that supplement the existing support provided by the conventional CRAN workflow, for some classes of developers.

Utilities like Biocheck, lintr, and goodpractice are a great start for self-teaching consistent coding standards and catching mistakes. I'm embarrassed to admit that they still catch things in my code like using & vs && in if blocks.

I like how CTVs don't try to be the authority on a topic. But instead present packages (and context with pros & cons) that are possibly relevant to the user.

Utilities like revdevcheck and rhub catch complicated problems more easily than local checks, so they likely reduce the human load during the CRAN review.

When I'm showing a package to a student for their first time, I feel they learn it faster and more thoroughly from a pkgdown site than from a CRAN page. Even though the underlying content is mostly identical.

I think that roxygen2's markdown-to-rd translator dramatically lowers the barrier for generating thorough documentation. I'm convinced that if you compared packages that do & don't use it, the conventional rd packages would have noticeably lighter/worse documentation (even after controlling for a lot of factors). When I come across little typos in other packages, it's easier to correct and submit a PR if they're using roxygen+markdown. When I switched ~12 years ago, my package documentation improved. I hear what you're saying about the downsides of an additional dependency that can go stale, but this may be an acceptable tradeoff to some: a package can't go years without maintenance, but the quality of its documentation is better during its useful life. A CTV-like page could present roxygen's pros/cons that you want people to be aware of. Few people have your experience & prospective.

Finally, I understand testing & code coverage may provide a false sense of security from threats like upstream changes. A CTV-like page might be a good place to warn people about this (and suggest using GitHub Action's cron feature). I'd love to blame other people, but I have to admit that I'm my own package's worst enemy. Upstream changes may break things once or twice a year, but my changes break things much more frequently, and the testing's quick feedback loops catch almost all of them.

I bet there are weaknesses in some of my arguments and you have good insights into things I'm overlooking. That may be the strongest reason why a CTV-like document could be useful.

wibeasley avatar Aug 26 '23 18:08 wibeasley

Please accept that there is only a user-developer continuum, and that trying to say that moving along the continuum is not chiefly up to the user seems misleading. The responsibility then is shared between the user and those offering advice. The advice gets between the user, who learns to rely on the advice (about how to make things easier) rather than keeping as close as possible to the systems themselves in order to control for changes in intervening elements. That is, as I was told after apologising for the learning gradient in a course, a steep learning curve means you learn a lot in a short time.

A discussion on R-devel a year or so ago indicated that those who have contact with many packages doubt the clain that roxygen2 improves documentation. My current experience weaning hundreds of packages off my retiring packages supports this, many help pages really communicate nothing. Classic help pages as you know almost always used examples from statistics text books, on their data sets. Vignettes can be very useful, where they avoid the trap of "selling" or "boosting", which is almost always misleading unless supported by comparison with other software doing something similar.

I think that there is a landing ground somewhere, and that we're inching towards it. If a CTV explained and pointed to salient "things to do", WRE, running check locally and elsewhere, and more broadly for maintainers of existing packages what kind of responsibility they have undertaken, getting to a draft may be feasible. But just listing packages claiming to make things "easier" isn't going to work.

rsbivand avatar Aug 27 '23 12:08 rsbivand

I found another one (mentioned today in the mailing list): pkgKitten; and saw another one mentioned in an issue: sinew.

I volunteer to start the task view. Should I'll open a pull request for a draft of the text with:

salient "things to do", WRE, running check locally and elsewhere, and more broadly for maintainers of existing packages what kind of responsibility they have undertaken, getting to a draft may be feasible

and some sections ? Or how should I proceed?

I'll be happy to have co-maintainers :wink:.

llrs avatar Sep 21 '23 15:09 llrs

I'll try to respond next week, but there is as yet no agreement in the task view team on whether any structure is feasible, given the proliferation of packages offering to make easier processes that are intrinsically complex and must be so. And basic structure should not promote any packages as such, given that everything is already in R and its base packages. So they need to be covered with WRE first, and any add-on packages described in relation to that foundation.

rsbivand avatar Sep 21 '23 15:09 rsbivand

@rsbivand Did the task view team reach a decision? If not yet, would it help if I provide a draft with some sections and a minimal text (without mentioning any package yet)?

llrs avatar Oct 06 '23 10:10 llrs

@llrs No decision or discussion. I cannot contribute until after packages I maintain are archived - this affects hundreds of CRAN packages, and much of the action is taking place now. I will have more time late-November/early-December.

rsbivand avatar Oct 06 '23 10:10 rsbivand

Understandable. Thank you for all the maintenance you have done with the packages and all the help to the maintainers that depended on these packages.

I'll wait till early December to produce the draft.

llrs avatar Oct 06 '23 11:10 llrs

@rsbivand I hope all the packages were archived without too much last minute pain. It is wonderful how you managed this process. I hope you, and all the team, have time to read a draft I wrote for this task view:

I wrote a first draft with different sections mimicking the writing R extensions. The text starts with a long introduction about what are the problems of using the packages and some related information based on the concerns raised here. Although I don't think it damages the task view, I also do not think it is the best place to share that advice.

You can suggest changes in the document (it is a google doc).

llrs avatar Dec 08 '23 10:12 llrs

@llrs Thanks, I will attempt to edit the draft during the week. I think that you can accept/modify/reject any edits I make in the google doc. May I use md markup?

rsbivand avatar Dec 11 '23 09:12 rsbivand

@rsbivand Yes, I can accept/modify/reject edits. Yes I didn't want to add a lot of style but it hopefully end up in a .md as the other CRAN task Views so please add markup as you wish.

llrs avatar Dec 11 '23 10:12 llrs

@llrs I'm still in process, and will get back when I'm closer to landing - this is very hard to write both for readers who are considering developing a first package and readers who already have written several packages including compiled code linking to external software.

rsbivand avatar Dec 16 '23 09:12 rsbivand

Thanks for the update @rsbivand. Does it mean that you envision this task view as a tutorial/guide for developing packages? You have more experience with task views but I would only make this distinction when mentioning specific packages, or adding a specific section for beginners.

I hope this means that according to the documentation this proposal has been endorsed by three editors and we are at the revisions step of the review process. Thanks to the editors for endorsing the task view.

llrs avatar Dec 16 '23 09:12 llrs

@llrs sorry, no, nothing like so far advanced. I'm not yet happy with what I have. It is much longer than the draft, and hasn't touched packages at all yet. I'm pretty uncertain whether it is possible to use the task view format at all; trying to create a text was and continues to be an attempt to test feasibility, without prejudicing the outcome.

rsbivand avatar Dec 16 '23 09:12 rsbivand

I hope you had some holidays these days and enjoyed time with your families. In the last answer, it was not clear to me, if at least three CRAN Task View Editors endorse this proposal. I am asking this before investing more time to improve either with my draft or with @rsbivand text (sorry, I don't have a link). I would like to know if you would endorse this task view:

  • [ ] @eddelbuettel
  • [ ] @zeileis
  • [ ] @tuxette (I think you endorse it, as per your initial comment)

Thanks you all for your work maintaining the CRAN task view.

llrs avatar Jan 13 '24 21:01 llrs

Dear @llrs . Considering the current discussions, which came after my initial comment, I don't want to endorse the TV yet. We have not advanced enough to a consensual position and the issues pointed by @rsbivand in his last comments are still not solved.

tuxette avatar Jan 14 '24 10:01 tuxette

@llrs Thanks for your initial work on this. @rsbivand shared his work-in-progress with me and I iterated on it to reach a form that three CTV Editors have endorsed to take forward. The redraft of the proposal is here: https://github.com/hturner/pkg-dev-ctv/blob/main/proposal.md and I have invited you as a collaborator. I am happy to join as a co-maintainer and also to add others from this thread.

You'll see that for contributed packages I have mostly added "CONSIDER" (some of my own ideas) or "SEE ALSO" links (to the old package development CTV or the awesome list). Perhaps you could make a start on reviewing and adding packages in bulleted lists to replace these placeholders? Likely some packages in the old lists are no longer maintained or have been superseded. We should also check any packages mentioned on this issue that are not covered by these lists and finally check for new packages.

hturner avatar Feb 15 '24 11:02 hturner

I'm very happy to hear this @hturner. Many thanks to all editors specially @rsbivand. I see now what you meant that just listing wouldn't work well. The text and guide will be very helpful to the R community and I hope it will help the CRAN volunteers by avoiding issues from packages.

I will check and see if there are packages from this issue, or from other sources that might be added.

Thanks for all the contributions.

llrs avatar Feb 15 '24 14:02 llrs