rustfmt icon indicating copy to clipboard operation
rustfmt copied to clipboard

Handling of numbered markdown lists.

Open anforowicz opened this issue 2 years ago • 10 comments

Fixes issue #5416

anforowicz avatar Jun 30 '22 00:06 anforowicz

@ytmimi - could you PTAL again?

FWIW I don't use github daily, so please point out if there are some technical issues with how I addressed the feedback and/or how I attempted to rebase the PR on top of the tip of the tree. In particular, I am not sure if there are supposed to be 3 separate commits here: https://github.com/rust-lang/rustfmt/compare/master...anforowicz:rustfmt:handling-numbered-lists (maybe I should instead amend the original commit used for the PR? or squash the 3 commits into one using some other mechanism...).

When you have a moment could you leave a comment referencing issue #5416 above these test cases [...]

Done

Overall I think these test cases are good. I would like to add a few more. When you have a chance could you add the following test cases:

Use ) instead of . since you also added that functionality.

Done (see the test that starts with: "Using the ')' instead of '.' character after the number")

  • Can we also have the deeply nested lists have enough content that they would wrap when wrap_comments=true is set.

Done (tweaked the original test - the one that now is called: "This is a numbered markdown list").

  • Can we also add a really deeply nested list (3 or 4 levels deep)
  • It think it would be interesting to have a test case where we use both numbered and non numbered lists.

Done (the last test - "Deep list that mixes various bullet and number formats").

Sigils at the start of a line

If you can think of any other test cases feel free to add them, but I think those additions would give us better test coverage!

I was wondering about sigils appearing at the start of a line in the middle of a paragraph. Some notes:

  • This seems like a separate, pre-existing issue (i.e. not something that this PR introduces)

  • I've found that this has been discussed to some extent in https://talk.commonmark.org/t/blank-lines-before-lists-revisited/1990 (AFAIU rustdoc uses the CommonMark Markdown specification):

    • This was quite a long discussion. I am not sure if a consensus has been reached.
    • Interesting examples mentioned in that discussion:

Maybe this should not be a list:

The Captain died in
1868.  He was buried in...

But this should start a list (making a special case for 1 vs non-1; bullet items are a bit unclear):

Our top priorities are
1. fix ordered lists

But then in a follow-up discussion it was pointed out that it is not clear if 1 vs non-1 should be special-cased - wouldn't we need to require a blank like before each and every item: https://talk.commonmark.org/t/blank-lines-before-lists-revisited/1990/16

So, I think I don't want to make any changes in this direction in this PR, but maybe there are some notes / questions to consider:

  • Would it be possible to eventually redesign comment processing so that it delegates markdown processing/wrapping to a separate tool or library? https://atom.io/packages/flowmark is one example (obviously this won't work because this is not a Rust library, nor a standalone executable AFAICT...).
  • Should the notes above be captured somewhere more permanent? A separate issue maybe?

anforowicz avatar Jul 05 '22 18:07 anforowicz

First off, thanks for the thoughtful questions and all the linked discussion!

FWIW I don't use github daily, so please point out if there are some technical issues with how I addressed the feedback and/or how I attempted to rebase the PR on top of the tip of the tree.

Understood! I do appreciate you taking the time to make those updates when you logged back on and saw the notification. Everyone's go their own way of responding to feedback and in this particular case I think just responding with "Done" is totally fine. I think had the ask been more involved than updating tests some additional discussion / clarification might have been needed, but we're good here!

In particular, I am not sure if there are supposed to be 3 separate commits here: https://github.com/rust-lang/rustfmt/compare/master...anforowicz:rustfmt:handling-numbered-lists (maybe I should instead amend the original commit used for the PR? or squash the 3 commits into one using some other mechanism...).

Just a little bit of background before directly answering your question. We follow the rust compilers no-merge policy. I've updated my git config to always rebase instead of merge using git config --global pull.rebase. You might consider updating your git config with the same command if you plan to do more work on rust 😁 If there are merge conflicts we need those to be resolved with a rebase instead of a merge.

You can look into interactive rebasing to squash your commits, but we can also just squash and merge the commits if you don't want to squash them yourself. In this case I think these changes make more sense as a single commit, but again don't feel obligated to squash them yourself.

I was wondering about sigils appearing at the start of a line in the middle of a paragraph

It's an interesting thought. Given that the main goal of rustfmt is to format rust code I don't see the need to dive too deeply into those details, especially if it complicates the implementation.

(AFAIU rustdoc uses the CommonMark Markdown specification):

rustdoc using pulldown-cmark to render markdown as HTML. See the Cargo.toml.

  • Would it be possible to eventually redesign comment processing so that it delegates markdown processing/wrapping to a separate tool or library?

Definitely something we'd be open to considering! This is also something I've thought about since we've got a few open issues related to markdown formatting. I briefly looked at pulldown-cmark-to-cmark, but haven't played around with trying to integrate it. I think the challenge with introducing a library to handle the markdown formatting is that we need to ensure that we don't introduce breaking formatting changes.

  • Should the notes above be captured somewhere more permanent? A separate issue maybe? Merge state

Feel free to open an issue if you'd like. As I mentioned, the main goal for rustfmt is to format rust code so markdown formatting is pretty low on the priority list and likely won't get much attention from the team any time soon as there's plenty on the backlog. That said, we'll certainly review PRs if there's anyone motivated enough to work on something like this.

ytmimi avatar Jul 06 '22 14:07 ytmimi

@ytmimi - what are the next steps here? Are there any actions that I need to take at this point (maybe somehow unmerging/rebasing the commits in my fork)?

anforowicz avatar Jul 08 '22 16:07 anforowicz

@ytmimi - what are the next steps here? Are there any actions that I need to take at this point

Again, thanks for your work on this. I'm still thinking through a few things. The examples you listed in https://github.com/rust-lang/rustfmt/pull/5423#issuecomment-1175371400 has got me thinking and I'd like to play around with some inputs. Specifically referring to

The Captain died in
1868.  He was buried in...

The issue you outlined there would also apply to the end of sentences since after this change we allow aaaaaa. to start a new itemized block. I'm thinking about what happens when we start a newline with the end of a sentence and then the next sentence needs to wrap?

Testing the following (curtesy of wikipedia) with your branch I get:

intput

//! Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air
//! moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH), Fe(OH)3), and is typically associated with the corrosion of refined iron.

output

//! Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction
//! of iron and oxygen in the catalytic presence of water or air
//! moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and
//!           iron(III) oxide-hydroxide (FeO(OH), Fe(OH)3), and is typically
//!           associated with the corrosion of refined iron

ytmimi avatar Jul 09 '22 20:07 ytmimi

Some random notes:

  • The problem of mid-paragraph-sigils is not new, but indeed the PR means that the problem may happen more frequently ("word." seems more likely than "- word" at a beginning of a line).
  • I wonder if limiting the number of characters in a sigil might mitigate some of the concerns.
    • Limiting to 1 character would seem safest and still fairly functional (especially since using "1." for each item works fine for Common Mark - this would offer a workaround if "10." got misformatted by rustfmt).
    • We could also limit to at most 2 characters. It seems that having more than 3 characters in a sigil should cover very rarely (some discussion of roman numerals in Common Mark can be found here - it seems that they are not supported but I haven't honestly finished reading this discussion in depth).
  • Ultimately, what rustfmt does with itemized list (including what it may do after this PR with numbered lists) is somewhat ad-hoc and heuristics-based - in the long-term this is a problem that Common Mark needs to resolve (in https://talk.commonmark.org/t/blank-lines-before-lists-revisited/1990 or around).

anforowicz avatar Jul 10 '22 15:07 anforowicz

Lots of good dialog here, thank you both!

IIRC, rustdoc is using cmark or some form of it anyway, and I'm not inherently opposed to us trying a radically different approach as instead of the current strategy of continuing to bolt things on to support additional cases as they arise.

However, any such radically different approach would need to be accompanied by some exhaustive tests (far more than what we have today) to avoid, or at least minimize, the amount of churn we would inflict on our users. To be clear, I think this testing would need to entail running rustfmt with new approach over various projects that are currently utilizing the feature. The associated config options that hit this in the first place are still unstable, but it's still incumbent upon us to be cautious given the potential impact.

Additionally, we'd also need to ensure continued interop with other relevant options and behavior, potential future behavior, and be mindful about any extra dependency bloat considering it's an opt-in feature.

calebcartwright avatar Jul 10 '22 16:07 calebcartwright

After thinking on it a bit more I found an edge case that would ned to be addressed before we could move forward

@ytmimi, could you please share more details about the edge case that needs to be addressed to unblock this PR?

(I might not reply for a while - I'll be away from a computer for a bit of summer vacation :-)

anforowicz avatar Jul 22 '22 22:07 anforowicz

@anforowicz apologies for the delay. I have some reservations based on the case you outlined in Sigils At the start of a line, which I also demonstrated in https://github.com/rust-lang/rustfmt/pull/5423#issuecomment-1179599205. I'm assuming there's a non zero chance that introducing these changes will cause existing doc comments to be poorly formatted, and I'd like to avoid that.

We could also limit to at most 2 characters

I like this as a workaroud to limiting the issue. Erring on the side of caution I'd also like to limit it to just numeric values, instead of also trying to support upper and lowercase ascii letters.

I feel like that's a good compromise that will address the issue you ran into in #5416

ytmimi avatar Jul 28 '22 02:07 ytmimi

Thanks @ytmimi! Can you PTAL again?

(Again - my apologies for the hap-hazardous git and github - hopefully the separate commits in my fork can be merged as a single commit?)

anforowicz avatar Aug 26 '22 19:08 anforowicz

@anforowicz no worries. I should have some time in the coming week to review this again.

ytmimi avatar Aug 28 '22 20:08 ytmimi

@ytmimi - gentle ping? Maybe this got missed because this PR is incorrectly marked as pr-waiting-on-author.

anforowicz avatar Sep 26 '22 19:09 anforowicz

r? @cynecx

anforowicz avatar Oct 05 '22 16:10 anforowicz

@rustbot label -pr-waiting-on-author +pr-follow-up-review-pending

anforowicz avatar Oct 05 '22 17:10 anforowicz

Error: The feature relabel is not enabled in this repository. To enable it add its section in the triagebot.toml in the root of the repository.

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

rustbot avatar Oct 05 '22 17:10 rustbot

@ytmimi - this PR should be ready for another review pass please.

In addition to resolving the PR feedback above I also did some other minor changes:

  • When resolving the PR feedback I realized that CommonMark calls the "*", "1." snippets "markers" rather than "sigils" so I also went ahead and did this rename. This is mostly contained within the code already touched by this PR, so hopefully okay.
  • And after a self-review I've also made some small tweaks of doc comments.

anforowicz avatar Oct 13 '22 20:10 anforowicz

@anforowicz thanks for turning around the feedback so quickly. I haven't looked at the changes yet, but from your description above they seem reasonable. I also wanted to let you know that I approved #5560. We haven't merged it yet, but I think it makes sense to merge that PR before this one. There will likely be some merge conflicts once that PR is merged so I wanted to give you a heads up. It might make sense to subscribe to that PR so you know when it's merged.

ytmimi avatar Oct 14 '22 14:10 ytmimi

As anticipated the merging of #5560 has introduced some merge conflicts, but I gather we should be ready to go with this once those conflicts are resolved.

calebcartwright avatar Feb 02 '23 03:02 calebcartwright

Thanks for the quick resolution @anforowicz!

@ytmimi - do you think this needs another quick pass, and then do you think this should be included in the next release or are shall we hold it till the following?

calebcartwright avatar Feb 02 '23 20:02 calebcartwright

@calebcartwright I took another look and I think this is good to go 👍🏼 and could be included in the next release, but I don't mind if we want to hold off until the following.

ytmimi avatar Feb 07 '23 16:02 ytmimi

@ytmimi - This seems like a worthwhile add to me, so more a question of when as opposed to if.

Suppose the options are to merge it now/tomorrow and pull it into the next release (pre let-else support), or wait and then do it in the release that follows let-else. I'll defer to you on whether you think we should go ahead with this now based on your confidence level in the implementation and adherence to formatting stability guarantee.

calebcartwright avatar Jun 19 '23 01:06 calebcartwright

haven't looked at this one personally but per team meeting in discussion on zulip earlier Yacin's comfortable with this, so going to pull it in as the last addition in the next release

calebcartwright avatar Jun 20 '23 01:06 calebcartwright