Trailing space (&/or terminating period) in a reference element?
Reference elements generated by u2o.py
Here's an example of a reference element containing a trailing space.
<note placement="foot"><reference type="annotateRef">8:27 </reference><catchWord>افود: </catchWord>عام طور پر عبرانی میں افود کا مطلب امامِ اعظم کا بالاپوش تھا (دیکھئے خروج <seg type="x-nested"><reference>28:4</reference></seg>)، لیکن یہاں اِس سے مراد بُت پرستی کی کوئی چیز ہے۔ </note>
Do you think it might be advisable to move the trailing space to after the </reference> ?
i.e. Such that this example becomes:
<note placement="foot"><reference type="annotateRef">8:27</reference> <catchWord>افود: </catchWord>عام طور پر عبرانی میں افود کا مطلب امامِ اعظم کا بالاپوش تھا (دیکھئے خروج <seg type="x-nested"><reference>28:4</reference></seg>)، لیکن یہاں اِس سے مراد بُت پرستی کی کوئی چیز ہے۔ </note>
cf. Some Bible translators add a terminating period (full-stop) after the last reference in a cross-reference marker. Likewise, the terminating period should be shifted to after the </reference>, n'est-ce pas?
Does the original usfm contain the trailing space or period in the reference markup? If yes, then that's why it's in the osis.
First, we should recognise that Bible translators are rarely as rigorous about markup as we programmers would like them to be.
Let's address the terminating period first.
That's not part of a valid reference, though it is part of a cross-reference note.
So after converting \x + ...\x* to OSIS, each reference (if there's more than one) goes into a reference element, and the separating punctuation (together with any space) goes between the reference elements.
The terminating period is just like these separating punctuation marks, only that it
- Happens to be at the end.
- Often happens to be a different character (not so with Polish).
The fact that the translator may have included a space just before \+xt* is usually just due to not realising that such an extra space isn't really required. There's usually a space just after \+xt*.
IMHO, that trailing space (superfluous as it might be) is better placed after </reference> in order to keep the text wrapped by the reference element to be a pure reference, rather than a reference plus a space.
Likewise, the translator may have include a superfluous space just before \fk as happened in my example. The space isn't really part of the real argument of the \fr part, which should be a pure reference, so it would be better treated as something to go between the annotateRef reference and the catchWord.
If you like, this is a reasonable adjustment that makes good sense now that the text is in XML.
These issues should really be fixed in the usfm source. u2o is a converter not a corrector. It was not designed (and was never intended) to fix problems that are present in the usfm markup. It's only designed (and intended) to convert usfm markup to osis.