metafacture-core
metafacture-core copied to clipboard
Incompatible StreamReceiver output by marc modules due to inconsistent leader handling
While the documentation of encode-marc21 states that it is compatible with the output of handle-marc-xml and decode-marc21, this is not factual due to inconsistent leader handling by decode-marc21, handle-marc-xml, encode-marc21 and encode-marcxml.
e.g.: We cannot transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.
Functional review: @TobiasNx Code review: @blackwinter
Behaviour of Flux-Modules:
decode-marc21
changes the leader to their specific function of the position:
See here
---
leader:
status: "p"
type: "a"
bibliographicLevel: "m"
typeOfControl: " "
characterCodingScheme: "a"
encodingLevel: " "
catalogingForm: "c"
multipartLevel: " "
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004 gw |||||r|||| 00||||eng "
"015 ":
a: "05,A03,2104"
with option emitleaderaswhow="true" the leader-element is an toplevel and sublevel field
See here
---
leader:
leader: "02602pam a2200529 c 4500"
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004 gw |||||r|||| 00||||eng "
"015 ":
a: "05,A03,2104"
z: "96,N47,0454"
"2": "dnb"
"0167 ":
handle-marc-xml keeps the leader as an own field:
See here:
---
type: "Bibliographic"
leader: "00000naa a2200000uc 4500"
"001": "1106253078"
"003": "DE-101"
"005": "20171202230117.0"
"007": "cr||||||||||||"
"008": "160712s2016 gw |||||o|||| 00||||eng "
"0167 ":
"2": "DE-101"
a: "1106253078"
"022 ":
encode-marcxml can handle the result of decode-marc21(emitleaderaswhole="true") but cannot if the leader is ommited in multiple fields results in leader with multiple fields.
Then re result looks like this:
<marc:record>
<marc:leader>p</marc:leader>
<marc:leader>a</marc:leader>
<marc:leader>m</marc:leader>
<marc:leader> </marc:leader>
<marc:leader>a</marc:leader>
<marc:leader> </marc:leader>
<marc:leader>c</marc:leader>
<marc:leader> </marc:leader>
It seems that there is no control if there is only one leader.
encode-marc21 cannot handle data from handle-marcxml: see
Error is:
org.metafacture.framework.FormatException: invalid tag format for reference field
at org.metafacture.biblio.iso2709.RecordBuilder.checkValidReferenceFieldTag (RecordBuilder.java:260)
org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:244)
org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:224)
org.metafacture.biblio.marc21.Marc21Encoder.processTopLevelLiteral (Marc21Encoder.java:254)
org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:186)
org.metafacture.biblio.marc21.MarcXmlHandler.endElement (MarcXmlHandler.java:135)
Also not from decode-marc21(emitleaderaswhole="true") see
The error is:
org.metafacture.framework.FormatException: literal must only contain a single character:leader
at org.metafacture.biblio.marc21.Marc21Encoder.processLiteralInLeader (Marc21Encoder.java:195)
org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:183)
org.metafacture.biblio.marc21.Marc21Decoder.emitLeader (Marc21Decoder.java:254)
org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:221)
org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:136)
So besides inconsistencies it is difficult to transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.
I would suggest the following changes:
- change the duplication of
leaderindecode-marc21(emitleaderaswhole="true"), so thatleaderis not anentitywith asubfieldbut there is only one elementleader - ~~add the option
emitleaderasentity="true"tohandle-marc-xmlso that it outputs marc as thedecode-marc21does by default~~ @blackwinter suggested a better way:emitleaderaswhole (with default true)this is more consistent and iffalseleader would output likedecode-marc21 - enable
encode-marcxmlandencode-marcso that they can handle theleaderas entity with subfields and as simple field
Just a minor observation:
add the option
emitleaderasentity="true"
Wouldn't it make more sense to use the same option emitleaderaswhole (with default true)?
Just a minor observation:
add the option
emitleaderasentity="true"Wouldn't it make more sense to use the same option
emitleaderaswhole(with defaulttrue)?
Or like that.
@dr0i would be nice if the handle-marc-xml -module would support the emitleaderaswhole= option soon.
it would help to make the almaFix especially the handling of leader-fields for the facets more readable and one would have less fuzz with variabes: https://github.com/hbz/lobid-resources/blob/4172bfef38c45e422cff14cfac56c6d81e7b8b67/src/main/resources/alma/alma.fix#L1-L11
I found this again. We cannot just transform marc21 -> marcxml or the other way around marcxml -> marc21 due to the inconsistent leader handling. We additionally need to transform the data with a fix. But even this does not work:
https://metafacture.org/playground/?flux=%22https%3A//raw.githubusercontent.com/metafacture/metafacture-core/master/metafacture-runner/src/main/dist/examples/read/marc21/10.marc21%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-marc21%28emitleaderaswhole%3D%22true%22%29%0A%7C+fix%28transformationFile%29%0A%7C+encode-marc21%0A%7C+print%0A%3B&transformation=move_field%28%22leader.leader%22%2C%22@leader%22%29%0Amove_field%28%22@leader%22%2C%22leader%22%29
Also docu states wrongly: https://github.com/metafacture/metafacture-core/blob/2cec78959d2c84ba6e408402680413098d9010eb/metafacture-biblio/src/main/java/org/metafacture/biblio/marc21/Marc21Encoder.java#L56-L57
@dr0i as we talked about with I.W. transformation marc21 -> marcxml is needed.
Found two workarounds for:
decode-marc21(emitLeaderAsWhole="true") -> encode-marc21: See here.
handle-marcXml -> encode-marc21: See here.
I try to condense the issues. I will give the scenarios references ([a,b,c ...] so we can easily refer to them :
a) marc21 -> marc21 works ( just do | decode-marc21(emitLeaderAsWhole="false"))
b) marc21-> marcxml works (just do | decode-marc21(emitleaderaswhole="true"))
c) handle-marcxml -> encode-marc21 doesn't work
For c) we have to think about a solution: The Marc21Encoder expects (in method processLiteralInLeader) that a leader consists of single literals which consists as a Byte (a leader entity with many values). I.e. a leader cannot be one String. See https://github.com/metafacture/metafacture-core/commit/6d04d6976c98eb7173c773b2f4ddca3b7e0037d3 for introducing this and also the motivation to do so (which I don't understand - I mean we see there are problems coming with the removing of parsing/producing the leader as one String.)).
We could solve c) by:
ca) "would be nice if the handle-marc-xml -module would support the emitleaderaswhole= option soon". We would allow emitleaderaswhole=false which would set them as a single Byte array or
cb) encode-marc21 would be able (again) to cope with a single leader String.
I think cb) would be the best , because as a sideeffect we wouldn't need to tell in a) emitleaderaswhole=false as it would also cope emitleaderaswhole=true.
I think we touch reasons for the change of handling of the leader here #524. Changes in the records when transforming marc21->marc21 (XML and binary) also need changes in the leader since part of the leader are generated based on the number of signs, indicators, elements, subfields. Otherwise the leader and the record are not valid.
Note: went with cb) as fix.