liblouis Further improvements to UEB

Further improvements to UEB:

Following discussion at ICEB, back translating braille dot 3 should be ordinary apostrophe U+0027 rather than U+2019. This is consistent with other tables and also fixes over 26,000 failed backward tests.
Following discussion at ICEB, the dot 5, used for "spaced digits" should only apply if a no-break space U+00a0 is used in the input. This is consistent with other braille translators. It fixes an issue with British postcodes which should be treated as two separate items. It also means that if someone wants spaced digits, numbers have to be joined together with no-break spaces deliberately. (Note that the rule for spaced digits could in future be strengthened only to apply if "words" either side of the no-break space only contain digits).
Added tests from issue #1018, fixes #1018.
Redefined the digit rules to fix the output for unknown input characters, fixes #1029. Note that several other changes were needed to make this fully work.

It would be worth checking that these changes have not affected any other tables which include the UEB tables as part of another language.

Note again that GIT thinks I want to merge far more commits than I do. It seems I still don't have the right command to synchronise after a new release. Help appreciated there. I think there should be about 10 commits, the oldest towards the end of Jan 2024.

Feb 08 '24 11:02 jrbowden

Note again that GIT thinks I want to merge far more commits than I do. It seems I still don't have the right command to synchronise after a new release. Help appreciated there. I think there should be about 10 commits, the oldest towards the end of Jan 2024.

I think the issue might be that you are contributing from your own 'master' branch. Instead of working in 'master', sync your repo with upstream, then create a 'new' branch of your fork for your new changes. Then, when ready, submit the pull request from your branch rather than your copy of 'master'. That way, the only 'new' commits it will see are those you added for this particular set of edits, not everything you've done on your fork from day 1.

Feb 09 '24 22:02 tibbsa

Hi @bertfrees many thanks for looking at this. The two tests both should have the same text and the same braille:

text: Answer all questions. Braille: ~7,answ} all "qs4~'

The first test has 20 characters in bold, the second test has all 21 characters in bold.

Just looking at the lines you quote, I'm wondering why there are three ' at the end of the second test.

I would ideally prefer to use Unicode braille because it saves a lot of bother with the quoting.

I trust this helps.

Feb 28 '24 11:02 jrbowden

@jrbowden I think Github mangled your copy & paste and considered some of it to be Markdown formatting.

For clarity, ignoring the xfail for a moment, I think what you intend as a test is:

- - Answer all questions.
  - '~7,answ} all "qs4~'''
  - typeform: {bold: '++++++++++++++++++++ '}

- - Answer all questions.
  - '~7,answ} all "qs4~'''
  - typeform: {bold: '+++++++++++++++++++++'}

However, why would the output be the same? If the bold ends before the final period, rather than after the final period, the bold-passage-end indicator should occur prior to the period. Certainly that is how Duxbury treats it.

Per Rule 9.7.2: "When it is clear in the print copy that punctuation is not included in a specific typeform and when a typeform terminator is required for other reasons, place the typeform terminator at the point where the typeform changes. When there is doubt, except for the hyphen, dash and ellipsis, consider the punctuation as being included in the typeform."

Now, I get that in the real world, even if in a word processor the user has stopped bolding just before a period, it is almost always not meaningful and it probably is not "clear in the print copy that punctuation is not included in a specific typeform". However, where Liblouis is dealing with an electronic copy and we only know for sure what the user has provided to us electronically, should we not honour that?

In my mind, we should actually be producing two different results for that input:

- - Answer all questions.
  - '~7,answ} all "qs~''4'
  - typeform: {bold: '++++++++++++++++++++ '}

- - Answer all questions.
  - '~7,answ} all "qs4~'''
  - typeform: {bold: '+++++++++++++++++++++'}

Incidentally, the double apostrophe is because YAML would otherwise consider that to be the end of the string. So we need 2 apostrophe's to make one a part of the string, and then a final apostrophe to close it.

Feb 28 '24 12:02 tibbsa

Thanks @tibbsa for clearing that up about the quoting.

What James intended as a test is indeed what is currently checked in. The first one, with the period not in bold, fails, and the other one passes. If we change the test to what Anthony suggests based on his interpretation of rule 9.7.2, both tests pass.

Feb 28 '24 12:02 bertfrees

Hi @tibbsa, @bertfrees, Sure. I did say that the first test (with the period not in bold) is debatable. I'm happy either way if you want to change the test so they both pass.

Feb 28 '24 16:02 jrbowden

I have pushed my local version of the branch (fa1ea7aa). Note that I have not overwritten your master branch, but rather added a merge commit, as to not create any Git issues. You should be able to just pull.

Feb 28 '24 17:02 bertfrees

liblouis liblouis copied to clipboard

Further improvements to UEB

liblouis
liblouis copied to clipboard