u2o icon indicating copy to clipboard operation
u2o copied to clipboard

Adjusting the OSIS to avoid orphaned verse tags in SWORD?

Open DavidHaslam opened this issue 6 years ago • 9 comments

Orphaned verse tags can occur in a number of contexts, some simple to describe and some rather more complex. This issue will initially focus on the simpler contexts in Bibles that are paragraphed.

To make things clearer to follow, the XML layout has been re-arranged in the snippets that are provided as examples, but this is only the same kind of cosmetic change that you can obtain using xmllint.

An orphaned verse tag is where the displayed verse tag is alone on one line and the verse text is displayed on the next line. It looks sloppy when the module is viewed in a SWORD front-end and especially so when there are lots of these in the same module.

Example:

					<verse sID="Matt.4.22" osisID="Matt.4.22" n="22" />And they immediately left the ship and their father, and followed him.
					<verse eID="Matt.4.22" />
				</p>
				<p>
					<verse sID="Matt.4.23" osisID="Matt.4.23" n="23" />And Jesus went about all Galilee, teaching in their synagogues, and preaching the gospel of the kingdom, and curing every disease, and every kind of sickness among the people.
					<verse eID="Matt.4.23" />

Observe that the verse eID milestone is before the paragraph break. This causes an orphaned verse tag for verse 23. Here's the fix:

					<verse sID="Matt.4.22" osisID="Matt.4.22" n="22" />And they immediately left the ship and their father, and followed him.
				</p>
				<p>
					<verse eID="Matt.4.22" />
					<verse sID="Matt.4.23" osisID="Matt.4.23" n="23" />And Jesus went about all Galilee, teaching in their synagogues, and preaching the gospel of the kingdom, and curing every disease, and every kind of sickness among the people.
					<verse eID="Matt.4.23" />

Here, the verse eID milestone has been moved into the next paragraph. This eliminates the display problem in SWORD front-ends.

Workaround:

For this simple scenario it's feasible to fix all the similar locations by a PCRE search and replace. Here's a tab delimited single line replace list that does that.

(<verse eID="\S+"\s?/>\s*)(</p>\s*<p>\s*)(<verse sID)	$2$$1$$3
  • My workaround was implemented using TextPipe.
  • Non-greedy matching is set outside the list - in the filter UI.
  • The above search pattern is impervious to XML layout.

Harder cases:

The more complex contexts that give rise to orphans are harder to deal with, but the principle is the same. Here's a partial list of possible scenarios.

  • eID milestone before section title, etc.
  • eID milestone before other paragraph types.
  • eID milestone in poetry , lists, tables, etc.

Conclusion:

My view is that tackling the simplest and most common context first should

  • Provide some insight into the nature of the issue
  • Separate the wheat from the chaff by having fixed the major cause of the issue
  • After that, we can research the more difficult cases having got the others out of the way.

DavidHaslam avatar Feb 09 '19 12:02 DavidHaslam

Matters arising:

  • Should there be any other cause for u2o.py to drop a verse eID milestone other than when it encounters either a verse sID milestone or a chapter eID milestone?
  • Is there any aspect of the OSIS transform that osis2mod does "under the hood" that may have a bearing on this issue?
  • We should be curious as to why SWORD behaves in this way.
  • Does the same display problem also occur with JSword based front-ends?
  • Is there anything misleading in either the OSIS Manual or in the SWORD developers' wiki that relates to the issue?

DavidHaslam avatar Feb 09 '19 12:02 DavidHaslam

This actually sounds more like a frontend issue to me rather than an osis formatting problem. One of the reasons I formatted the osis in this manner was to ensure that tags were properly nested so as to avoid display issues with orphaned verses.

Before I tried ensuring that there was proper nesting of tags, I had issues with how bibles were displayed in some sword frontends. Adjusting the tags to try to ensure proper nesting eliminated the display issues for me in all of the frontends that I use.

adyeths avatar Feb 09 '19 13:02 adyeths

Is Xiphos one of the front-ends that you use for module testing?

  • I have certainly observed orphaned verse tags when a module is displayed using Xiphos for Windows.
  • I have often needed to apply such a workaround as described above in order to eliminate them.

How do you understand "nesting" in the context of the milestone versions of verse and chapter elements?

DavidHaslam avatar Feb 09 '19 14:02 DavidHaslam

I should add that my workaround does not cause osis2mod to report any NESTING errors.

Nor does it cause the OSIS to fail to validate to the .DTD schema.

DavidHaslam avatar Feb 09 '19 14:02 DavidHaslam

I test using xiphos, bibletime, and AndBible. They are the only frontends that I can use at this time. Bibletime never had problems but I think it does it's own formatting.

I understand nesting with milestone tags the same as with other tags. The only difference being they can cross boundaries since they are milestones.

adyeths avatar Feb 09 '19 14:02 adyeths

FWIW, I've just developed a new experimental TextPipe filter that seeks to fix all possible contexts. i.e. As a more comprehensive workaround.

This uses a pattern that does not make use of the paragraph p element.

Here's the pseudo-code for my new method:

  • Insert a tilde ~ just before each verse sID milestone element
  • Restrict to (text) between verse sID milestone and tilde - Send variable 1 to subfilter
    • Restrict to not including a chapter boundary
      • Move the verse eID milestone down as far as it can go
  • Remove all tilde

Aside: Using a tilde simply ensures that the processing does not miss 50% of the verses.

Observations:

  • It fixed all the previous locations for the same OSIS file
  • It fixed the 3 extra places I'd fixed by hand in Romans 1
  • It fixed 4 more places that I'd failed to spot earlier

For the latter, it indicates that I could edit the SFM file to remove some spurious extra \p tags.

Joy, pure joy!

TextPipe is a superb tool for trying out new algorithms. I wouldn't be without it.

DavidHaslam avatar Feb 09 '19 14:02 DavidHaslam

My workaround is based on the notion that there should not be any content between a verse eID milestone and the next verse sID milestone (in the same chapter).

  • Pre-verse content is specially handled by osis2mod.
  • It becomes a div element with type="x-preverse".

This ensures that all module content can be referenced by SWORD.

Nesting

  • Milestone nesting can happen in a badly prepared OSIS file and is detected by osis2mod.
  • Milestone nesting is not detected during OSIS validation.

DavidHaslam avatar Feb 09 '19 14:02 DavidHaslam

FIO: Here is a clipboard copy of my new TextPipe filter.

Clipboard copy of Fix OSIS verse eID milestones.txt

  • This should be understandable by non-TextPipe users that are skilled programmers.
  • I can supply the .fll file upon request, should anyone be interested.

DavidHaslam avatar Feb 09 '19 15:02 DavidHaslam

Any further thoughts on this issue?

DavidHaslam avatar Apr 24 '20 10:04 DavidHaslam