fledgling object
When thinking about grouping and ordering items in the NEWs file, I thought about desc::desc_normalize().
At the moment fledge prepends lines in the NEWS file.
I'd like to prepend lines, then do the grouping and ordering (news_normalize()) then write to disc.
I can do it via a temporary character vector but how nice would it be if it were an object like desc objects?
the normalize argument name is at least a good inspiration
Yes, an immutable object that holds the NEWS in a structured way, and the relevant data from DESCRIPTION, with reading/writing support, seems like a great idea.
Class name suggestion: fledgling :grin:
Requirements as I see them:
- The 🐔 object (S3 class:
"fledge_ling"?) should contain a structured version ofNEWS.md, with a fallback if the file can't be parsed for whatever reason. - The parse-deparse operation should lead to a perfect roundtrip (preserving all whitespace perhaps ignoring EOL whitespace) for all valid and invalid
NEWS.mdfiles- we can double-check this with all
NEWS.mdwe see on https://github.com/cran/
- we can double-check this with all
- Adding a block of NEWS items to the top should always work (even if pandoc or dependencies are not installed)
- The fledgling prints in a useful way (full output for most recent version, compressed output for following versions, excerpt from
DESCRIPTION) - We support "development version"-style .9000 versions (#147, as suggested by @gadenbuie)
- Read and write from/to disk
- As internal data structure I propose a nested tibble, these tend to be easier to work with than nested lists
- Alternatives?
Taking the discussion from #97 here:
Whathever we do (tinkr, pandoc -o NEWS.json, pkgdown-style, ...), perfect roundtrip should be top priority if possible, because this is what happens now (because we don't touch the rest of NEWS.md). I'm fine with some very light normalization (consistent spacing between headers, removing whitespace at EOL), we'd need to discuss anything beyond that.
Action items:
- [x] Collect (a sample of)
NEWS.mdfiles - [x] Propose a target data structure (R pseudocode)
- [x] Experiment with md parsers
- [x] Decide what parser to use
- [x] Implement
For collecting NEWS.md files from CRAN, I remember @mpadge worked on a project that consisted of analyzing all CRAN packages (?).
That doesn't currently do anything with NEWS files, but i could easily grab them all for you in a few minutes if you wanted? Let me know
@krlmlr maybe to add to the requirements, a better parsing of the git log https://github.com/cynkra/fledge/issues/145#issuecomment-1124785185 (XML for all the things? :sweat_smile:)
Reg the round trip, I agree a test on many NEWS.md is necessary.
Some notes on tools:
- I'm partial to tinkr, that's been used & is well maintained. Documented losses: https://github.com/ropensci/tinkr/#loss-of-markdown-style=
- I also know of https://github.com/rundel/parsermd (I have never used it)
- For running Pandoc, there's knitr but also https://github.com/cderv/pandoc
- pkgdown doesn't do a roundtrip, HTML is edited to produce HTML.
From the pkgstats experience of round-tripping, biggest hiccup is encoding. Loads of CRAN pkgs have code that is able to be read with encoding = "UTF-8", but even if written back with encoding guessed from Encoding(), round trips will fail. But sounds like you just want a sample here, so i guess sample only pkgs for which Encoding() gives UTF-8 then should be fine.
I wonder whether "future news items" should fit somewhere in the object (the newsworthy items between the last bump_version() call and now). If they are stored and printed, this should be memoised.
How would one get a fledgling object with pending items? Why is it important to separate items that are not yet materialized?
How would one get a fledgling object with pending items?
Using the code from update_news()
Why is it important to separate items that are not yet materialized?
Not important if one runs bump_version regularly :sweat_smile:
I'm asking the wrong questions.
I see the following flow:
- fledgling is created only from
NEWS.md -
update_news()calls a function that creates a derived version of the fledgling object - derived fledgling is written to
NEWS.md
I don't think that we should scrape pending news in step 1. For step 2, do we need to distinguish between new and existing items? Step 3 should work the same with the original and the derived fledgling.
do we need to distinguish between new and existing items?
A message with the new items might be nice but I also imagine users could simply look at the git diff after 3. anyway.
We should make waldo::compare() look nice.
A method to "sitrep" https://github.com/cynkra/fledge/pull/404/files/482856b6399bd30b86fc9de11e540abe6d27a47c#r959940487
temp_file <- withr::local_tempfile(fileext = ".html")
pandoc::pandoc_run(c("-o", temp_file, "NEWS.md", "--section-divs"))
xml2::read_html(temp_file, encoding = "UTF-8")
need to protect the @ that are interpreted as citations by Pandoc
also worth looking into converting to JSON, actually!
mmmh maybe not
when parsing content inside a version we need to go to two levels (conventional commits types and scopes).
some notes
temp_file <- withr::local_tempfile(fileext = ".html")
pandoc::pandoc_run(c("-o", temp_file, "NEWS.md", "--section-divs"))
html <- xml2::read_html(temp_file, encoding = "UTF-8")
# if versions are level2, make them level1 for the sake of simplicity
# but we need to store the info for writing it back
versions <- xml2::xml_find_all(html, ".//section[@class='level1']")
parse_version <- function(version) {
types <- xml2::xml_find_all(version, ".//section[@class='level2']")
}
parse_type <- function(type) {
scopes <- xml2::xml_find_all(version, ".//section[@class='level3']")
}
- items have to have a type.
- if all items in a version section have the Uncategorized type, merge them when writing?
- items in a type do not have to have a scope. If some have scopes and others not, what comes first? Is there an "uncategorized" scope?
Keeping this open for brainstorming. We should start with a simple object that does the minimum amount of parsing for now, and expand as and if necessary.
check what happens when a version content is not items but a paragraph instead (as is the case in fledge changelog itself actually)
Are we done here?
It seems so except
The fledgling prints in a useful way (full output for most recent version, compressed output for following versions, excerpt from DESCRIPTION)