roxygen2 icon indicating copy to clipboard operation
roxygen2 copied to clipboard

Catch URL formatting issues in DESCRIPTION?

Open cboettig opened this issue 2 years ago • 11 comments

Some packages have historically used annotations on the URL field, such as curl:

URL: https://github.com/jeroen/curl (devel) 
    https://curl.se/libcurl/ (upstream)
BugReports: https://github.com/jeroen/curl/issues

(https://github.com/jeroen/curl/blob/e7502d27c7cff17baa0930c3327f84ae4918aa24/DESCRIPTION#L22-L23)

This syntax is no longer acceptable on CRAN, resulting in warnings such as:

   Found the following (possibly) invalid URLs:
     URL: https://docs.ropensci.org/gbifdb/ (website)
https://github.com/ropensci/gbifdb
       From: man/gbifdb-package.Rd
       Status: Error
       Message: URL using bad/illegal format or missing URL
       URL contains spaces
     Spaces in an http[s] URL should probably be replaced by %20

but will not be caught by a standard R CMD check (on R <= 4.2.0) or by urlchecker. Would it be reasonable to add such a check to urlchecker? (The correct syntax apparently cannot have non-URLs in the comma-separated list I think).

cboettig avatar May 17 '22 16:05 cboettig

I suspect that the issue is that roxygen2 does not parse the urls in DESCRIPTION correctly, and it creates invalid URLs in the package manual page. Do you have a package that reproduces this?

gaborcsardi avatar May 18 '22 08:05 gaborcsardi

@gaborcsardi sure, you can see this happen, e.g. in the curl package I linked above.

cboettig avatar May 19 '22 18:05 cboettig

I doubt that this is happening to the curl package, though. Or do you have a way to reproduce it with curl?

The error you show is not coming from DESCRIPTION, but from man/gbifdb-package.Rd, and it is the result of roxygen2 writing bad links in that file, I think.

gaborcsardi avatar May 20 '22 11:05 gaborcsardi

Hi Gabor, my apologies for being unclear.

I think the way the URL specification is done in the curl package is invalid due to the annotations, just like the gbifdb example. However, as you say, the curl package does not throw the same warning on check_win_devel(). The warning shown from gbifdb is a consequence of an incorrectly formed \url tag being created in the DESCRIPTION by roxygen for default package documentation: https://github.com/ropensci/gbifdb/blob/212b34d6649f155bddec605e41d04f249ce3d4e8/man/gbifdb-package.Rd#L14

curl package-level documentation page is not generated in this way, and does not include the link. But this is not I think a bug in roxygen or desc, but just a consequence of an invalid URL string in the DESCRIPTION. For instance, you will note that inside RStudio, on the package list for the curl package, the URL that RStudio links is broken: https://github.com/jeroen/curl%20(devel)%20https://curl.se/libcurl/%0A(upstream) .

To me, this suggests that adding non-url annotations in the URL field is invalid, as suggested in the R exts manual:

The ‘URL’ field may give a list of URLs separated by commas or whitespace, for example the homepage of the author or a page where additional material describing the software can be found. These URLs are converted to active hyperlinks in CRAN package listings. See Specifying URLs.

and so I think it would be better if we had a way to catch these cases?

Thanks for all you do! Obviously you know these details way better than me, so perhaps the annotation idea is fine and this really should be a bug report against desc or whatever is parsing the URLs in the DESCRIPTION, so totally happy to defer to you.

cboettig avatar May 20 '22 20:05 cboettig

Yes, this makes sense, but this is not something that we can easily fix in this package I am afraid. urlchecker uses base R's code to extract the URLs from the package, and that code can actually handle these URL fields well. Which also suggests that they are OK (for R-core and CRAN at least).

So I think a we'd better fix this in roxygen2, and RStudio.

gaborcsardi avatar May 21 '22 04:05 gaborcsardi

I came through this repo for other reasons, but this caught my eye. I've also wrestled with these non-URL annotations in the URL field, because usethis parses that field for various purposes.

There's a lot of info here: https://github.com/r-lib/desc/issues/56

I think desc already knows how to cope (see above) and therefore this is a roxygen2 problem, as @gaborcsardi says, when it builds package-level help using info found in DESCRIPTION.

jennybc avatar Aug 18 '22 02:08 jennybc

In practice, the curl URLs are handled correctly by the CRAN web pages: https://cran.rstudio.com/web/packages/curl/, the desc package, etc. and they have been around for a while, so I don't think we should ban them at this point.

I'll fix it in roxygen2, it already uses desc, so it should be pretty easy.

gaborcsardi avatar Aug 18 '22 06:08 gaborcsardi

Thanks @gaborcsardi & @jennybc ! Does this need to be reported separately for the RStudio (server) IDE? It is still creating broken links in the package window there:

image

cboettig avatar Aug 18 '22 21:08 cboettig

@cboettig Yes, please report it at rstudio/rstudio. Thanks!

gaborcsardi avatar Aug 19 '22 03:08 gaborcsardi

Sounds like this was resolved via a fix to desc?

hadley avatar Nov 02 '23 12:11 hadley

No, I suspect this still needs a fix in roxygen2, but the smarts already in desc will make it less painful to fix.

jennybc avatar Nov 02 '23 22:11 jennybc