metafacture-core icon indicating copy to clipboard operation
metafacture-core copied to clipboard

Incompatible StreamReceiver output by marc modules due to inconsistent leader handling

Open TobiasNx opened this issue 2 years ago • 7 comments

While the documentation of encode-marc21 states that it is compatible with the output of handle-marc-xml and decode-marc21, this is not factual due to inconsistent leader handling by decode-marc21, handle-marc-xml, encode-marc21 and encode-marcxml.

e.g.: We cannot transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.

Functional review: @TobiasNx Code review: @blackwinter


Behaviour of Flux-Modules:

decode-marc21 changes the leader to their specific function of the position: See here

---
leader:
  status: "p"
  type: "a"
  bibliographicLevel: "m"
  typeOfControl: " "
  characterCodingScheme: "a"
  encodingLevel: " "
  catalogingForm: "c"
  multipartLevel: " "
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004    gw |||||r|||| 00||||eng  "
"015  ":
  a: "05,A03,2104"

with option emitleaderaswhow="true" the leader-element is an toplevel and sublevel field See here

---
leader:
  leader: "02602pam a2200529 c 4500"
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004    gw |||||r|||| 00||||eng  "
"015  ":
  a: "05,A03,2104"
  z: "96,N47,0454"
  "2": "dnb"
"0167 ":

handle-marc-xml keeps the leader as an own field: See here:

---
type: "Bibliographic"
leader: "00000naa a2200000uc 4500"
"001": "1106253078"
"003": "DE-101"
"005": "20171202230117.0"
"007": "cr||||||||||||"
"008": "160712s2016    gw |||||o|||| 00||||eng  "
"0167 ":
  "2": "DE-101"
  a: "1106253078"
"022  ":

encode-marcxml can handle the result of decode-marc21(emitleaderaswhole="true") but cannot if the leader is ommited in multiple fields results in leader with multiple fields.

Then re result looks like this:

	<marc:record>
		<marc:leader>p</marc:leader>
		<marc:leader>a</marc:leader>
		<marc:leader>m</marc:leader>
		<marc:leader> </marc:leader>
		<marc:leader>a</marc:leader>
		<marc:leader> </marc:leader>
		<marc:leader>c</marc:leader>
		<marc:leader> </marc:leader>

It seems that there is no control if there is only one leader.


encode-marc21 cannot handle data from handle-marcxml: see

Error is:

org.metafacture.framework.FormatException: invalid tag format for reference field
    at org.metafacture.biblio.iso2709.RecordBuilder.checkValidReferenceFieldTag (RecordBuilder.java:260)
        org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:244)
        org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:224)
        org.metafacture.biblio.marc21.Marc21Encoder.processTopLevelLiteral (Marc21Encoder.java:254)
        org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:186)
        org.metafacture.biblio.marc21.MarcXmlHandler.endElement (MarcXmlHandler.java:135)

Also not from decode-marc21(emitleaderaswhole="true") see

The error is:

org.metafacture.framework.FormatException: literal must only contain a single character:leader
    at org.metafacture.biblio.marc21.Marc21Encoder.processLiteralInLeader (Marc21Encoder.java:195)
        org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:183)
        org.metafacture.biblio.marc21.Marc21Decoder.emitLeader (Marc21Decoder.java:254)
        org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:221)
        org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:136)

So besides inconsistencies it is difficult to transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.

TobiasNx avatar Jun 10 '22 12:06 TobiasNx

I would suggest the following changes:

  • change the duplication of leader in decode-marc21(emitleaderaswhole="true"), so that leader is not an entity with a subfield but there is only one element leader
  • ~~add the option emitleaderasentity="true" to handle-marc-xml so that it outputs marc as the decode-marc21 does by default~~ @blackwinter suggested a better way: emitleaderaswhole (with default true) this is more consistent and if false leader would output like decode-marc21
  • enable encode-marcxml and encode-marc so that they can handle the leader as entity with subfields and as simple field

TobiasNx avatar Jun 10 '22 12:06 TobiasNx

Just a minor observation:

add the option emitleaderasentity="true"

Wouldn't it make more sense to use the same option emitleaderaswhole (with default true)?

blackwinter avatar Jun 10 '22 13:06 blackwinter

Just a minor observation:

add the option emitleaderasentity="true"

Wouldn't it make more sense to use the same option emitleaderaswhole (with default true)?

Or like that.

TobiasNx avatar Jun 10 '22 13:06 TobiasNx

@dr0i would be nice if the handle-marc-xml -module would support the emitleaderaswhole= option soon. it would help to make the almaFix especially the handling of leader-fields for the facets more readable and one would have less fuzz with variabes: https://github.com/hbz/lobid-resources/blob/4172bfef38c45e422cff14cfac56c6d81e7b8b67/src/main/resources/alma/alma.fix#L1-L11

TobiasNx avatar Jun 14 '22 14:06 TobiasNx