WordPress-Coding-Standards icon indicating copy to clipboard operation
WordPress-Coding-Standards copied to clipboard

Add a check for the leading and trailing spaces

Open amieiro opened this issue 1 year ago • 5 comments

As I described in https://github.com/WordPress/WordPress-Coding-Standards/issues/2501, at translate.w.org, sometimes we get translations that start or end with one or more empty spaces, and we want to avoid this.

This PR adds a new check to detect if a translatable string has leading or trailing spaces.

Fixes https://github.com/WordPress/WordPress-Coding-Standards/issues/2501.

amieiro avatar Nov 04 '24 10:11 amieiro

@amieiro Just checking in as you never responded to the question I posed in my initial review.... Do you still intend to finish off this PR ?

jrfnl avatar Dec 24 '24 22:12 jrfnl

Sorry, I don't have the time to finish it at this time.

amieiro avatar May 27 '25 13:05 amieiro

@amieiro Seeing as you've been working on this again - could you please explain why you have set things up to have separate errors for spaces vs tabs vs vtabs vs newlines ?

My previous question was only about having tests in place with these and the sniff handling all types of whitespace, not about splitting that up into multiple errors.

I cannot think of any situation where one type of (leading/trailing) whitespace would be acceptable, but another would not be, so having different error codes doesn't make sense to me.

Is there a reason for this distinction based on your experiences in the Polyglots team ? I'd be interested to hear your reasoning for this.

jrfnl avatar Jun 15 '25 16:06 jrfnl

@jrfnl Sorry, this PR is not yet ready for review and I forgot to convert it to draft. As we return to contribute with WordPress.org, I resumed work on this PR.

The horizontal tabulations are not often seen at the beginning or end of sentences, but they are seen in the middle, so I wanted to prevent its use, since I don't think it is correct to use it. You can see a use case inside a string in the core here. The same applies to the new lines. You can see an example in the core here. And we have another thing to check: the different whitespace characters. The regular space characters is U+0020, but there are a lot of different whitespace characters: U+00A0, U+2000, U+2001, U+2002,...

The most common problem is the U+0020 space problem (in fact it is the only one I usually see), so the others are really edge situations. I can continue this PR in two different approaches:

  • Contemplate only the U+0020 case and add the other cases to the tests.
  • Add all the commented situations.

Which approach do you prefer?

amieiro avatar Jun 16 '25 08:06 amieiro

@jrfnl Sorry, this PR is not yet ready for review and I forgot to convert it to draft. As we return to contribute with WordPress.org, I resumed work on this PR.

Okay, thanks for letting me know. I've converted it to draft now and won't do an in-detail review until it has been marked as "Ready to review".

The horizontal tabulations are not often seen at the beginning or end of sentences, but they are seen in the middle, so I wanted to prevent its use, since I don't think it is correct to use it. You can see a use case inside a string in the core here. The same applies to the new lines. You can see an example in the core here.

I think that this PR should probably focus on the original problem it was trying to solve - leading and trailing whitespace.

Expanding the scope to mid-string whitespace characters feels like scope-creep and may have different implications (i.e., would involve a different decision point, potentially delaying the merging of the code related to the original problem).

I suggest opening a follow-up issue for that to discuss if this needs flagging and if so, what would need flagging. (I mean, looking at the examples, these seem like perfectly valid use-cases for mid string vertical whitespace, especially if you also consider RTL and BTT languages).

And we have another thing to check: the different whitespace characters. The regular space characters is U+0020, but there are a lot of different whitespace characters: U+00A0, U+2000, U+2001, U+2002,...

The most common problem is the U+0020 space problem (in fact it is the only one I usually see), so the others are really edge situations. I can continue this PR in two different approaches:

  • Contemplate only the U+0020 case and add the other cases to the tests.
  • Add all the commented situations.

Which approach do you prefer?

For leading and trailing whitespace, IMO the new errors should flag all whitespace characters, but without differentiating in the error message or code. What I mean by that is:

  • Change the error message(s) to use the term "whitespace" instead of explicitly saying what type of whitespace you're flagging.
  • Change the error code(s) to also use the term "whitespace" - this would mean you'd end up with two error codes: LeadingWhiteSpace and TrailingWhiteSpace.

Does that help ?

jrfnl avatar Jun 16 '25 14:06 jrfnl