mdformat icon indicating copy to clipboard operation
mdformat copied to clipboard

Always reformat tab characters as space characters

Open jamesquilty opened this issue 4 years ago • 10 comments

Description / Summary

The tab character (0x09) is a pest.

Currently, mdformat seeks to "apply consistent white space across the board" (Formatting Style: Whitespace) and does the right thing when tab characters appear as leading white space for indentation: it eliminates the pest by replacing them with the appropriate number of space characters for indentation. Line-trailing tabs are also eliminated.

Unfortunately, tab characters in heading and paragraph bodies, where HTML white space collapse will apply when the HTML is rendered for display, are not eliminated by collapsing them into a single space character. I think that tab characters should be eliminated in this context, also, because tabs cause problems.

I believe there are three contexts where tab characters might appear and there's a case for elimination in each:

Context Action
1. Line-leading white space for Indentation Eliminate, replacing with the appropriate number of space characters. This is the current behaviour.
2. Wherever HTML white space collapse applies Eliminate, replacing with a single space character. This is the proposed enhancement.
3. Wherever HTML white space collapse will not apply, e.g. Code spans, Fenced code blocks Either (a) preserve, allowing the HTML renderer to determine how to display tab characters in <code> or <pre> blocks or (b) expand to the appropriate number of space characters.

I'd propose that always eliminating tab characters and replacing them with the appropriate number of space characters is the way to "apply consistent white space across the board" and that the current mixed treatment of tab characters is inconsistent with mdformat's style goals. Mixed tabs and spaces are seldom good.

There might be an open question with regard to (3), above, because CSS might change the width of tab characters rendered in <code> or <pre> or other HTML blocks?

Value / benefit

  • Consistent treatment of white space introduced by tab characters.
  • Markdown source will more closely resemble the rendered output.
  • Avoids display problems in editors caused by differing tab width settings and different invisible white space.
  • Produces output which will not violate Markdownlint rule MD010 - No Hard tabs

Implementation details

I think that modifying the TextWrapper instance attributes here

https://github.com/executablebooks/mdformat/blob/a856f538e2dcb81a83e9013fb073f16cd6e53972/src/mdformat/renderer/_context.py#L330-L336

to

        expand_tabs=True,
        tabsize=1

will achieve the desired white space collapse of a tab character to a space character, but it won't help in collapsing multiple tab-and-space character runs into a single space. The replace_whitespace instance attribute would seem to affect all white space characters and not just tab characters.

Tasks to complete

No response

jamesquilty avatar Sep 07 '21 00:09 jamesquilty

The CommonMark Spec 0.20: Preprocessing used to specify:

Tabs in lines are immediately expanded to spaces, with a tab stop of 4 characters:

but this was changed in version 2.1 onward. I'm not sure what the motivation was for the change, but there two relevant issues on the CommonMark GitHub Project: commonmark-spec#386 and commonmark-spec#318.

It's probably worth noting that any tab characters that ever find their way into my Markdown documents are introduced by copy-and-paste and aren't there intentionally.

jamesquilty avatar Sep 07 '21 01:09 jamesquilty

Has any progress been made on this? I'm very interested in this feature and I'd be open to making a PR if there's interest.

jgopel avatar Feb 03 '23 19:02 jgopel

@jgopel, I'm still interested in this feature. I don't think that it's been implemented independently of this Issue.

jamesquilty avatar Feb 10 '23 12:02 jamesquilty

@hukkin Would you be interested in merging this if I were to make a PR for it?

jgopel avatar Feb 10 '23 15:02 jgopel

Expanding tabs to spaces is a task that you can do very easily with a bunch of standard POSIX/UNIX utilities. According to the UNIX philosophy (or say: best practice), that is what you should be doing. I would say, this is even a prime example of where this should be applied.

hoijui avatar Apr 24 '23 07:04 hoijui

@hoijui No, thank you. Please don't butt-in and tell people what you think they should be doing, it's inappropriate.

jamesquilty avatar Apr 24 '23 08:04 jamesquilty

where did I do that?

hoijui avatar Apr 24 '23 08:04 hoijui