Align on the way bibref author names are being parsed by the NLM and TEI XML parsers
Currently the TeiToExtractedDocumentMetadataTransformer, working on top of the Grobid TEI XML output, parses the authors defined in the bibliographic reference section by traversing the XML author subelement:
<biblStruct xml:id="b66">
<monogr>
<title level="m" type="main">Biosynthesis of Carotenoids and Apocarotenoids by Microorganisms and Their Industrial Potential</title>
<author>
<persName>
<forename type="first">C</forename>
<surname>Zhang</surname>
</persName>
</author>
<imprint>
<date type="published" when="2018">2018</date>
<publisher>IntechOpen</publisher>
<pubPlace>London</pubPlace>
</imprint>
</monogr>
<note type="raw_reference">Zhang, C. (2018). Biosynthesis of Carotenoids and Apocarotenoids by Microorganisms and Their Industrial Potential. London: IntechOpen.</note>
</biblStruct>
by processing forenames first and then surnames.
This results in the author string representation different than the one produced by the NlmToDocumentWithBasicMetadataConverter, working on top of the Cermine NLM XML output. This discrepancy might potentially affect the citation matching accuracy working on NlmToDocumentWithBasicMetadataConverter output only (so far).
E.g. the first author of the following reference:
[2] R. L. Campbell , R. Banner , J. Konick-McMahan , and M. D. Naylor , “ Discharge planning and home follow-up of the elderly patient with heart failure ,” The Nursing Clinics of North America , vol. 33 , no. 3 , pp. 497 , 1998 .
processed by NLM parser looks like:
Campbell, R. L.
while TEI parser produces:
R L Campbell
So it requires the following changes:
- placing surnames before forenames
- adding
.for single letter forenames - separating surnames from forenames with
,