metafacture-core icon indicating copy to clipboard operation
metafacture-core copied to clipboard

Incompatible StreamReceiver output by marc modules due to inconsistent leader handling

Open TobiasNx opened this issue 3 years ago • 7 comments

While the documentation of encode-marc21 states that it is compatible with the output of handle-marc-xml and decode-marc21, this is not factual due to inconsistent leader handling by decode-marc21, handle-marc-xml, encode-marc21 and encode-marcxml.

e.g.: We cannot transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.

Functional review: @TobiasNx Code review: @blackwinter


Behaviour of Flux-Modules:

decode-marc21 changes the leader to their specific function of the position: See here

---
leader:
  status: "p"
  type: "a"
  bibliographicLevel: "m"
  typeOfControl: " "
  characterCodingScheme: "a"
  encodingLevel: " "
  catalogingForm: "c"
  multipartLevel: " "
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004    gw |||||r|||| 00||||eng  "
"015  ":
  a: "05,A03,2104"

with option emitleaderaswhow="true" the leader-element is an toplevel and sublevel field See here

---
leader:
  leader: "02602pam a2200529 c 4500"
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004    gw |||||r|||| 00||||eng  "
"015  ":
  a: "05,A03,2104"
  z: "96,N47,0454"
  "2": "dnb"
"0167 ":

handle-marc-xml keeps the leader as an own field: See here:

---
type: "Bibliographic"
leader: "00000naa a2200000uc 4500"
"001": "1106253078"
"003": "DE-101"
"005": "20171202230117.0"
"007": "cr||||||||||||"
"008": "160712s2016    gw |||||o|||| 00||||eng  "
"0167 ":
  "2": "DE-101"
  a: "1106253078"
"022  ":

encode-marcxml can handle the result of decode-marc21(emitleaderaswhole="true") but cannot if the leader is ommited in multiple fields results in leader with multiple fields.

Then re result looks like this:

	<marc:record>
		<marc:leader>p</marc:leader>
		<marc:leader>a</marc:leader>
		<marc:leader>m</marc:leader>
		<marc:leader> </marc:leader>
		<marc:leader>a</marc:leader>
		<marc:leader> </marc:leader>
		<marc:leader>c</marc:leader>
		<marc:leader> </marc:leader>

It seems that there is no control if there is only one leader.


encode-marc21 cannot handle data from handle-marcxml: see

Error is:

org.metafacture.framework.FormatException: invalid tag format for reference field
    at org.metafacture.biblio.iso2709.RecordBuilder.checkValidReferenceFieldTag (RecordBuilder.java:260)
        org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:244)
        org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:224)
        org.metafacture.biblio.marc21.Marc21Encoder.processTopLevelLiteral (Marc21Encoder.java:254)
        org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:186)
        org.metafacture.biblio.marc21.MarcXmlHandler.endElement (MarcXmlHandler.java:135)

Also not from decode-marc21(emitleaderaswhole="true") see

The error is:

org.metafacture.framework.FormatException: literal must only contain a single character:leader
    at org.metafacture.biblio.marc21.Marc21Encoder.processLiteralInLeader (Marc21Encoder.java:195)
        org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:183)
        org.metafacture.biblio.marc21.Marc21Decoder.emitLeader (Marc21Decoder.java:254)
        org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:221)
        org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:136)

So besides inconsistencies it is difficult to transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.

TobiasNx avatar Jun 10 '22 12:06 TobiasNx

I would suggest the following changes:

  • change the duplication of leader in decode-marc21(emitleaderaswhole="true"), so that leader is not an entity with a subfield but there is only one element leader
  • ~~add the option emitleaderasentity="true" to handle-marc-xml so that it outputs marc as the decode-marc21 does by default~~ @blackwinter suggested a better way: emitleaderaswhole (with default true) this is more consistent and if false leader would output like decode-marc21
  • enable encode-marcxml and encode-marc so that they can handle the leader as entity with subfields and as simple field

TobiasNx avatar Jun 10 '22 12:06 TobiasNx

Just a minor observation:

add the option emitleaderasentity="true"

Wouldn't it make more sense to use the same option emitleaderaswhole (with default true)?

blackwinter avatar Jun 10 '22 13:06 blackwinter

Just a minor observation:

add the option emitleaderasentity="true"

Wouldn't it make more sense to use the same option emitleaderaswhole (with default true)?

Or like that.

TobiasNx avatar Jun 10 '22 13:06 TobiasNx

@dr0i would be nice if the handle-marc-xml -module would support the emitleaderaswhole= option soon. it would help to make the almaFix especially the handling of leader-fields for the facets more readable and one would have less fuzz with variabes: https://github.com/hbz/lobid-resources/blob/4172bfef38c45e422cff14cfac56c6d81e7b8b67/src/main/resources/alma/alma.fix#L1-L11

TobiasNx avatar Jun 14 '22 14:06 TobiasNx

I found this again. We cannot just transform marc21 -> marcxml or the other way around marcxml -> marc21 due to the inconsistent leader handling. We additionally need to transform the data with a fix. But even this does not work:

https://metafacture.org/playground/?flux=%22https%3A//raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-marc21%28emitleaderaswhole%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-marc21%0A%7C+print%0A%3B&transformation=move_field%28%22leader.leader%22%2C%22@leader%22%29%0Amove_field%28%22@leader%22%2C%22leader%22%29

TobiasNx avatar Jun 14 '23 08:06 TobiasNx

Also docu states wrongly: https://github.com/metafacture/metafacture-core/blob/2cec78959d2c84ba6e408402680413098d9010eb/metafacture-biblio/src/main/java/org/metafacture/biblio/marc21/Marc21Encoder.java#L56-L57

TobiasNx avatar Jun 14 '23 09:06 TobiasNx

@dr0i as we talked about with I.W. transformation marc21 -> marcxml is needed.

TobiasNx avatar Apr 09 '24 14:04 TobiasNx

Found two workarounds for: decode-marc21(emitLeaderAsWhole="true") -> encode-marc21: See here.

handle-marcXml -> encode-marc21: See here.

TobiasNx avatar Apr 15 '24 09:04 TobiasNx

I try to condense the issues. I will give the scenarios references ([a,b,c ...] so we can easily refer to them :

a) marc21 -> marc21 works ( just do | decode-marc21(emitLeaderAsWhole="false")) b) marc21-> marcxml works (just do | decode-marc21(emitleaderaswhole="true"))
c) handle-marcxml -> encode-marc21 doesn't work

For c) we have to think about a solution: The Marc21Encoder expects (in method processLiteralInLeader) that a leader consists of single literals which consists as a Byte (a leader entity with many values). I.e. a leader cannot be one String. See https://github.com/metafacture/metafacture-core/commit/6d04d6976c98eb7173c773b2f4ddca3b7e0037d3 for introducing this and also the motivation to do so (which I don't understand - I mean we see there are problems coming with the removing of parsing/producing the leader as one String.)).

We could solve c) by: ca) "would be nice if the handle-marc-xml -module would support the emitleaderaswhole= option soon". We would allow emitleaderaswhole=false which would set them as a single Byte array or
cb) encode-marc21 would be able (again) to cope with a single leader String.

I think cb) would be the best , because as a sideeffect we wouldn't need to tell in a) emitleaderaswhole=false as it would also cope emitleaderaswhole=true.

dr0i avatar Apr 18 '24 12:04 dr0i

I think we touch reasons for the change of handling of the leader here #524. Changes in the records when transforming marc21->marc21 (XML and binary) also need changes in the leader since part of the leader are generated based on the number of signs, indicators, elements, subfields. Otherwise the leader and the record are not valid.

TobiasNx avatar Apr 18 '24 13:04 TobiasNx

Note: went with cb) as fix.

dr0i avatar Apr 22 '24 12:04 dr0i