fledge fledgling object

When thinking about grouping and ordering items in the NEWs file, I thought about desc::desc_normalize().

At the moment fledge prepends lines in the NEWS file.

I'd like to prepend lines, then do the grouping and ordering (news_normalize()) then write to disc. I can do it via a temporary character vector but how nice would it be if it were an object like desc objects?

May 17 '22 13:05 maelle

the normalize argument name is at least a good inspiration

May 17 '22 13:05 maelle

Yes, an immutable object that holds the NEWS in a structured way, and the relevant data from DESCRIPTION, with reading/writing support, seems like a great idea.

May 18 '22 03:05 krlmlr

Class name suggestion: fledgling :grin:

May 19 '22 12:05 maelle

Requirements as I see them:

The 🐔 object (S3 class: "fledge_ling"?) should contain a structured version of NEWS.md, with a fallback if the file can't be parsed for whatever reason.
The parse-deparse operation should lead to a perfect roundtrip (preserving all whitespace perhaps ignoring EOL whitespace) for all valid and invalid NEWS.md files
- we can double-check this with all NEWS.md we see on https://github.com/cran/
Adding a block of NEWS items to the top should always work (even if pandoc or dependencies are not installed)
The fledgling prints in a useful way (full output for most recent version, compressed output for following versions, excerpt from DESCRIPTION)
We support "development version"-style .9000 versions (#147, as suggested by @gadenbuie)
Read and write from/to disk
As internal data structure I propose a nested tibble, these tend to be easier to work with than nested lists
- Alternatives?

Taking the discussion from #97 here:

Whathever we do (tinkr, pandoc -o NEWS.json, pkgdown-style, ...), perfect roundtrip should be top priority if possible, because this is what happens now (because we don't touch the rest of NEWS.md). I'm fine with some very light normalization (consistent spacing between headers, removing whitespace at EOL), we'd need to discuss anything beyond that.

Action items:

[x] Collect (a sample of) NEWS.md files
[x] Propose a target data structure (R pseudocode)
[x] Experiment with md parsers
[x] Decide what parser to use
[x] Implement

May 19 '22 23:05 krlmlr

For collecting NEWS.md files from CRAN, I remember @mpadge worked on a project that consisted of analyzing all CRAN packages (?).

May 19 '22 23:05 krlmlr

That doesn't currently do anything with NEWS files, but i could easily grab them all for you in a few minutes if you wanted? Let me know

May 20 '22 06:05 mpadge

@krlmlr maybe to add to the requirements, a better parsing of the git log https://github.com/cynkra/fledge/issues/145#issuecomment-1124785185 (XML for all the things? :sweat_smile:)

May 20 '22 09:05 maelle

Reg the round trip, I agree a test on many NEWS.md is necessary.

Some notes on tools:

I'm partial to tinkr, that's been used & is well maintained. Documented losses: https://github.com/ropensci/tinkr/#loss-of-markdown-style=
I also know of https://github.com/rundel/parsermd (I have never used it)
For running Pandoc, there's knitr but also https://github.com/cderv/pandoc
pkgdown doesn't do a roundtrip, HTML is edited to produce HTML.

May 20 '22 10:05 maelle

From the pkgstats experience of round-tripping, biggest hiccup is encoding. Loads of CRAN pkgs have code that is able to be read with encoding = "UTF-8", but even if written back with encoding guessed from Encoding(), round trips will fail. But sounds like you just want a sample here, so i guess sample only pkgs for which Encoding() gives UTF-8 then should be fine.

May 20 '22 10:05 mpadge

I wonder whether "future news items" should fit somewhere in the object (the newsworthy items between the last bump_version() call and now). If they are stored and printed, this should be memoised.

Aug 25 '22 15:08 maelle

How would one get a fledgling object with pending items? Why is it important to separate items that are not yet materialized?

Aug 27 '22 04:08 krlmlr

How would one get a fledgling object with pending items?

Using the code from update_news()

Why is it important to separate items that are not yet materialized?

Not important if one runs bump_version regularly :sweat_smile:

Aug 29 '22 04:08 maelle

I'm asking the wrong questions.

I see the following flow:

fledgling is created only from NEWS.md
update_news() calls a function that creates a derived version of the fledgling object
derived fledgling is written to NEWS.md

I don't think that we should scrape pending news in step 1. For step 2, do we need to distinguish between new and existing items? Step 3 should work the same with the original and the derived fledgling.

Aug 29 '22 05:08 krlmlr

do we need to distinguish between new and existing items?

A message with the new items might be nice but I also imagine users could simply look at the git diff after 3. anyway.

Aug 29 '22 07:08 maelle

We should make waldo::compare() look nice.

Aug 30 '22 10:08 krlmlr

A method to "sitrep" https://github.com/cynkra/fledge/pull/404/files/482856b6399bd30b86fc9de11e540abe6d27a47c#r959940487

Sep 01 '22 10:09 maelle

temp_file <- withr::local_tempfile(fileext = ".html")
pandoc::pandoc_run(c("-o", temp_file, "NEWS.md", "--section-divs"))
xml2::read_html(temp_file, encoding = "UTF-8")

Nov 10 '22 14:11 maelle

need to protect the @ that are interpreted as citations by Pandoc

Nov 10 '22 14:11 maelle

also worth looking into converting to JSON, actually!

Nov 10 '22 14:11 maelle

mmmh maybe not

Nov 10 '22 14:11 maelle

when parsing content inside a version we need to go to two levels (conventional commits types and scopes).

Nov 10 '22 14:11 maelle

some notes

temp_file <- withr::local_tempfile(fileext = ".html")
pandoc::pandoc_run(c("-o", temp_file, "NEWS.md", "--section-divs"))
html <- xml2::read_html(temp_file, encoding = "UTF-8") 

# if versions are level2, make them level1 for the sake of simplicity
# but we need to store the info for writing it back

versions <- xml2::xml_find_all(html, ".//section[@class='level1']")

parse_version <- function(version) {
  types <- xml2::xml_find_all(version, ".//section[@class='level2']")
}

parse_type <- function(type) {
  scopes <- xml2::xml_find_all(version, ".//section[@class='level3']")
}

Nov 10 '22 14:11 maelle

items have to have a type.
if all items in a version section have the Uncategorized type, merge them when writing?
items in a type do not have to have a scope. If some have scopes and others not, what comes first? Is there an "uncategorized" scope?

Nov 10 '22 14:11 maelle

Keeping this open for brainstorming. We should start with a simple object that does the minimum amount of parsing for now, and expand as and if necessary.

Nov 10 '22 14:11 krlmlr

check what happens when a version content is not items but a paragraph instead (as is the case in fledge changelog itself actually)

Nov 21 '22 13:11 maelle

Are we done here?

Feb 04 '23 14:02 krlmlr

It seems so except

The fledgling prints in a useful way (full output for most recent version, compressed output for following versions, excerpt from DESCRIPTION)

Feb 06 '23 06:02 maelle