usfm-grammar icon indicating copy to clipboard operation
usfm-grammar copied to clipboard

Check if input is USFM itself before attempting to parse

Open kavitharaju opened this issue 3 years ago • 2 comments

How usfm_grammar 3.x would behave if we gave it a random text file? Can we do some checks like, if no \id found in the first 3 content lines of the file, then bail?

kavitharaju avatar Oct 29 '22 09:10 kavitharaju

Here's what I use:

\id GEN ENG-US (p.sfm) - [GTP] Galilee Translation Project 2021[CC0] Hackett [7]
\id AAA BBB-CC (DDDD)  - [EEE] Fffffff Fffffffffff Fffffff 2021[CC0] Kkkkkkk [L]

Where the ID line is (theoretically) parsed into variables

Var Example Definitiion [spec] (data form)
id (&) (all) Project ID-complete field
id0 (&) AAA Project ID-Book ID [USFM]
id1 (&) BBB Project ID-ISO639 Language [p.sfm]
id2 (&) CC Project ID-ISO3166 Country [p.sfm]
id3 (&) DDD Project ID-Tagging Language [p.sfm]
id4 (&) EEE Project ID-Acronym (3Letter) [p.sfm]
id5 (&) FFF Project ID Title [p.sfm]
id6 (&) GGG Project ID Text Freeze Date [p.sfm]
id7 (&) HHH Project ID Rights Code [p.sfm] (creative commons)
id8 (&) KKK Project ID Rights Owner [p.sfm] (of final work)
id9 (&) LLL Project ID Status Level [p.sfm] (1-7 p.sfm publishing status, not 1-3 USFM community acceptance.)

So, specifically to USFM conformance:

If exactly "(USFM)" is found before the first dash on the ID line, then the work should conform to the USFM standard listed with the \usfm tag, or USFM 2.5 if no \usfm tag is found.

This affects linking, tables and images.

  • Links \jmp are only intra-document.
  • Tables (\tr#) will have no preceeding definition \rem or closing \b tags, and will fail somewhat gracefully as blue text paragraphs instead.
  • Images will have no print/display size information embedded into them.

cmahte avatar Oct 29 '22 11:10 cmahte

Thank you @cmahte for sharing this! I could not, however, find any official documentation to support this syntax specification. Is this something Paratext (or some such software) does? If so, could you point to the documentation?

joelthe1 avatar Nov 18 '22 21:11 joelthe1