mdformat
mdformat copied to clipboard
Always reformat tab characters as space characters
Description / Summary
The tab character (0x09) is a pest.
Currently, mdformat seeks to "apply consistent white space across the board" (Formatting Style: Whitespace) and does the right thing when tab characters appear as leading white space for indentation: it eliminates the pest by replacing them with the appropriate number of space characters for indentation. Line-trailing tabs are also eliminated.
Unfortunately, tab characters in heading and paragraph bodies, where HTML white space collapse will apply when the HTML is rendered for display, are not eliminated by collapsing them into a single space character. I think that tab characters should be eliminated in this context, also, because tabs cause problems.
I believe there are three contexts where tab characters might appear and there's a case for elimination in each:
| Context | Action |
|---|---|
| 1. Line-leading white space for Indentation | Eliminate, replacing with the appropriate number of space characters. This is the current behaviour. |
| 2. Wherever HTML white space collapse applies | Eliminate, replacing with a single space character. This is the proposed enhancement. |
| 3. Wherever HTML white space collapse will not apply, e.g. Code spans, Fenced code blocks | Either (a) preserve, allowing the HTML renderer to determine how to display tab characters in <code> or <pre> blocks or (b) expand to the appropriate number of space characters. |
I'd propose that always eliminating tab characters and replacing them with the appropriate number of space characters is the way to "apply consistent white space across the board" and that the current mixed treatment of tab characters is inconsistent with mdformat's style goals. Mixed tabs and spaces are seldom good.
There might be an open question with regard to (3), above, because CSS might change the width of tab characters rendered in <code> or <pre> or other HTML blocks?
Value / benefit
- Consistent treatment of white space introduced by tab characters.
- Markdown source will more closely resemble the rendered output.
- Avoids display problems in editors caused by differing tab width settings and different invisible white space.
- Produces output which will not violate Markdownlint rule MD010 - No Hard tabs
Implementation details
I think that modifying the TextWrapper instance attributes here
https://github.com/executablebooks/mdformat/blob/a856f538e2dcb81a83e9013fb073f16cd6e53972/src/mdformat/renderer/_context.py#L330-L336
to
expand_tabs=True,
tabsize=1
will achieve the desired white space collapse of a tab character to a space character, but it won't help in collapsing multiple tab-and-space character runs into a single space. The replace_whitespace instance attribute would seem to affect all white space characters and not just tab characters.
Tasks to complete
No response
The CommonMark Spec 0.20: Preprocessing used to specify:
Tabs in lines are immediately expanded to spaces, with a tab stop of 4 characters:
but this was changed in version 2.1 onward. I'm not sure what the motivation was for the change, but there two relevant issues on the CommonMark GitHub Project: commonmark-spec#386 and commonmark-spec#318.
It's probably worth noting that any tab characters that ever find their way into my Markdown documents are introduced by copy-and-paste and aren't there intentionally.
Has any progress been made on this? I'm very interested in this feature and I'd be open to making a PR if there's interest.
@jgopel, I'm still interested in this feature. I don't think that it's been implemented independently of this Issue.
@hukkin Would you be interested in merging this if I were to make a PR for it?
Expanding tabs to spaces is a task that you can do very easily with a bunch of standard POSIX/UNIX utilities. According to the UNIX philosophy (or say: best practice), that is what you should be doing. I would say, this is even a prime example of where this should be applied.
@hoijui No, thank you. Please don't butt-in and tell people what you think they should be doing, it's inappropriate.
where did I do that?