rispy icon indicating copy to clipboard operation
rispy copied to clipboard

Handling tags with empty values

Open holub008 opened this issue 7 months ago • 2 comments

Back with another spec corner case-- the below truncated example comes from our friends at Embase:

import rispy

test_ris_str = """TY  - JOUR
ID  - 2006713348
T1  - Outcome Measures After Shoulder Stabilization in the Athletic Population: A Systematic Review of Clinical and Patient-Reported Metrics
A1  - Fanning E.
Y1  - 2020//
N2  - Background: Athletic endeavor can require the "athletic shoulder" to tolerate significant load through supraphysiological range and often under considerable repetition. 
Outcome measures are valuable when determining an athlete's safe return to sport...
KW  - *athlete
KW  - biomechanics
KW  - bone remodeling
JF  - Orthopaedic Journal of Sports Medicine
JA  - Orthop. J. Sports Med.
VL  - 8
IS  - 9
SP  -
PB  - SAGE Publications Ltd (E-mail: [email protected])
SN  - 2325-9671 (electronic)
DO  - http://dx.doi.org/10.1177/2325967120950040
ER  -"""

out = rispy.loads(test_ris_str)
out[0]['number']  # '9 SP  -'

As you can see, the empty SP - tag is detected as a wrap of the IS tag, which is not what the RIS writer intended.

Any thoughts on recognizing (and most probably discarding) empty tags like SP here?

It's difficult because detecting & keeping line wrap is extremely useful (see in this same record, with the abstract in N2 being wrapped), and it's possible, though relatively, unlikely that a legitimate wrapped line could conflict with the RIS tag format.

holub008 avatar Jul 18 '24 22:07 holub008