license-list-XML icon indicating copy to clipboard operation
license-list-XML copied to clipboard

Non printing characters

Open jamin-aws-ospo opened this issue 1 year ago • 10 comments

Removing non breaking spaces and line feeds from text versions of licenses.

jamin-aws-ospo avatar Mar 27 '23 20:03 jamin-aws-ospo

:-1:

First of all, this PR only changes the test texts, not the license texts.

But I'm not in favor of this blind change to all texts. When you want to match some license text that includes these non-breaking spaces (as it should, in some cases), you would need to implement the matching anyway.

zvr avatar Mar 27 '23 21:03 zvr

I disagree about "When you want to match some license text that includes these non-breaking spaces". If a license text can conceivably be rendered into ASCII, it should never contain a non-printing-character. We have been avoiding this problem until now because of accidental mis-implementations of regex in Ruby as vs Python.

MarkAtwood avatar Mar 27 '23 23:03 MarkAtwood

First of all, this PR only changes the test texts, not the license texts.

It was the test texts I was instructed to change. If you'd like other texts changed instead, I'm happy to do so.

jamin-aws-ospo avatar Mar 28 '23 01:03 jamin-aws-ospo

First of all, this PR only changes the test texts, not the license texts.

The LicenseListPublisher actually copies the test text to the data repository text directory, so those would be the text to change if you want to alter the text directory. Note that the template is generated from the License XML file, so this change won't impact those files.

goneall avatar Mar 28 '23 03:03 goneall

@MarkAtwood 5 out of the 6 of the license texts that contain non-breaking spaces can not be rendered in (7-bit) ASCII, as they are in French and have accented letters, guillemets, etc.

When one will encounter this license text elsewhere, in all likelihood it will still contain all these characters, including the non-breaking spaces. Having the SPDX version a little different (having replaced only the nbspace characters with ASCII spaces) is not very useful, in my view.

zvr avatar Mar 28 '23 10:03 zvr

@zvr there are more than 6 licenses that have non-breaking spaces.

I'm happy to revert the changes for the files that should have the non-breaking space, such as the French texts.

jamin-aws-ospo avatar Mar 28 '23 14:03 jamin-aws-ospo

In license texts (the src directory) only GD has non-breaking spaces in English text. In test files (the test directory) there are more.

I have no issue changing these extra files to contain spaces.

zvr avatar Mar 28 '23 14:03 zvr

@jamin-aws-ospo please revert the changes to the FR language license texts, keep the changes in the EN texts. And then we can nudge this to merge.

MarkAtwood avatar Apr 18 '23 04:04 MarkAtwood

I believe all French texts have been reverted.

jamin-aws-ospo avatar Jun 01 '23 15:06 jamin-aws-ospo

@zvr - what are you thoughts on this as updated?

jlovejoy avatar Sep 06 '23 18:09 jlovejoy