icu4x
icu4x copied to clipboard
Have a better API changelog
It would be nice if the changes to public API were presented in a readable form for approval. cargo public-api exists, however it is noisy. Turning that into a readable API log requires a lot of manual effort.
We have diplomat-coverage which already does this manual allowlisting, but the FFI allowlist is not the same as the public allowlist. We could potentially build off of a shared allowlist, though.
I'm not yet sure if building a useful API changelog tool is an easy endeavor. Worth investigating.
An MVP for this might just be a better-organized changelog, where public API changes are listed separately.
cc @sffc @robertbastian
I think the main tricky thing here is that this needs a proper design for how it filters and presents API changes in a way that does not end up with a huge changelog. Once we have that, we can see if existing tooling can be massaged into the right shape.
#7197 brings up an interesting facet: here I've been talking about this document as a TC-facing one. But it would be nice to include it in the changelog, too. We might want to further winnow it somehow, or figure out a process for integrating it:
Generally, how this integrates into CHANGELOG.md is a tricky question, too, and one not yet covered by https://github.com/unicode-org/icu4x/issues/7153. Some potential options:
- Maybe the API changelog is succinct enough that we can just copy paste the whole thing into
- Maybe we have some rules as to which bullet points are included in the main changelog
- Maybe we copy paste all API changelog entries as sub-bullet points of "normal" changelog entries, e.g. the entry would be "Add arithmetic for Date (#nnn)" and it would have a sub-bullet-point saying "New APIs: Date::add, Date::until, Calendar::whatever, DateDifferenceOptions, ...".
My mental model has been something like the third bullet above, where we put the entries directly in CHANGELOG.md if they are succinct enough, and we link to a supplementary doc if they are too big and not useful to clients.
Alright, I promised in #7197 that I would write out the challenges / design decisions to be made.
I do think I have answers to most of these questions. I think we will need to discuss some of those answers, I can post my full thoughts on a solution later. But this should be sufficient to demonstrate that there is a fair amount of work to be done here to produce a useful (for TC) API changelog.
What do we need to display?
So, firstly, the fully expansive list of things that constitute "public API" in Rust are as follows. Note that many of these have an obvious answer whether they should be included or not, but this helps give an idea of what we get when we run various tools like cargo public-api:
- Modules and reexports
- Items (structs, enums, traits, type defs, constants)
- Enum variants
- Struct fields
- Methods (incl trait methods)
- Associated types and consts
- implicit implementations of auto traits (Send, Sync)
- direct implementation of derive traits (Copy, Clone, Debug, PartialEq, ...)
- direct implementation of other traits
- indirect implementation of other traits, via blanket impls (this includes stuff like auto-Into impls)
- The variance of the type over its type parameters
- Impl items (methods, assoc types, etc) inherited on a type because of a new trait impl (for each of the cases above).
- The actual type parameters (we might add new defaults!)
- For a new reexport, every method, field, impl, (....) found on that item
- Actually this list isn't fully expansive there's a whole bunch of little things that I'm probably forgetting.
This rapidly explodes for even a small API surface. Here's the 1500-entry public-api for icu_calendar filtered for unstable items, etc.
Almost certainly we don't care about auto traits, indirect impls, and recursively following reexports. Some of the rest, I'm not sure.
We probably don't need to list associated items if we are listing that a type now implements a trait. That would immediately greatly winnow down the output of a tool like public-api. From manual inspection that's still not enough, though.
Do we filter new APIs differently from modifications to existing APIs?
Secondly, I think the answer here is different for new APIs vs existing APIs. For example, if we add a new options bag, maybe it sufficies to say "new struct: DateFromFieldsOptions (docs link)". Maybe we don't need to list its fields (maybe! I don't know the best answer here). Does that apply to its methods? Its initial trait set?
Whereas if we add a new field or method to an existing type in a new release, of course we should list that.
How do we organize it?
Even after heavy filtering, I'm not sure the information will be useful or reviewable as just a list. Probably we'd want to organize it somewhat. How is this organized?
What's the process?
Finally, once we know what we want, we should figure out how it fits in the process. It depends on the precise requirements, but my current impression of a useful API changelog is that it is a fair amount of work to produce. Maybe that ends up not being the case, but I wouldn't be convinced until we have something concrete. If the API changelog is a lot of work, we have three routes forward there:
- Make it the release drivers' problem. This feels like it would explode the amount of work in the release and make release drivering a much harder job to take on. In general we have been trying to make it easier to do releases; so that we can serve client needs better when releases are needed.
- Make it the PR author's problem (or the PR reviewer's problem). This might be similar to some of the things we are talking about in https://github.com/unicode-org/icu4x/issues/7154, where the PR author produces an API changelog.
- Write a tool to do this.
cargo public-apiuses the public-api crate, which we could presumably build on. This is my preferred route.
An example of a case that a process must catch, or else the process is insufficient: valid APIs that smell internal but are not internal being changed to doc(hidden), such as https://github.com/unicode-org/icu4x/pull/7147#discussion_r2500972062. I'm highlighting this because it seems like something likely to fall through the cracks if it relies on only human review.
This is already caught by semver CI
Our current process around semver CI does not notice that, though, or it did not work in #7147.
Yes it did: https://github.com/unicode-org/icu4x/pull/7147/checks?check_run_id=54741206689
@robertbastian No, I mean humans in the process did not notice that change. If you noticed it, you should have highlighted it in the PR, which is a potential process change we need. Or the reviewers should be on the hook for checking. IMO everyone should be, preventing breakages is everyone's responsibility.
Part of the problem is that CSC has a lot of false positives around new APIs so personally I didn't click through to that failure because I expected it to be red there. Which is a dangerous expectation to get into and I should probably stop doing that.
Looking at CI, and in particular the semver check, is part of the reviewer's responsibility.
Yes, and, the PR author also has a stake in giving the reviewer the information they need to successfully review, readily available.
We caught this eventually; and when I first looked at the PR I was doing a partial-but-sufficient review. So perhaps this wasn't a process failure. I would probably have looked at the CSC eventually. I don't know.
But I'm not sure as it stands I trust the current process to catch this at the PR layer. Having a prerelease or mid-release-cycle process that produces such an API changelog would definitely be useful here, which is the extent of the discussion on this here.
I don't think we should over dissect whether we had a process failure in #7147. I think the core point is that that's the type of thing that an API changelog ought to surface, regardless of whether this is "handled" in the PR process.
If this would have been merged, it would have been listed in the semver CI check on main, checking that is part of the release checklist.
Agreed. So the process would have probably eventually caught this before a release.
As noted elsewhere, the semver CI check is both too noisy for some things and not noisy enough for others, though. It's fine for its primary purpose which is preventing accidental breakages.
This thing is caught by it, but the semver CI check is insufficient in other ways for an "API changelog". So we should make sure this case is caught by the API changelog being discussed here.