metafacture-core
metafacture-core copied to clipboard
Incompatible StreamReceiver output by marc modules due to inconsistent leader handling
While the documentation of encode-marc21 states that it is compatible with the output of handle-marc-xml
and decode-marc21
, this is not factual due to inconsistent leader handling by decode-marc21
, handle-marc-xml
, encode-marc21
and encode-marcxml
.
e.g.: We cannot transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.
Functional review: @TobiasNx Code review: @blackwinter
Behaviour of Flux-Modules:
decode-marc21
changes the leader to their specific function of the position:
See here
---
leader:
status: "p"
type: "a"
bibliographicLevel: "m"
typeOfControl: " "
characterCodingScheme: "a"
encodingLevel: " "
catalogingForm: "c"
multipartLevel: " "
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004 gw |||||r|||| 00||||eng "
"015 ":
a: "05,A03,2104"
with option emitleaderaswhow="true"
the leader-element is an toplevel and sublevel field
See here
---
leader:
leader: "02602pam a2200529 c 4500"
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004 gw |||||r|||| 00||||eng "
"015 ":
a: "05,A03,2104"
z: "96,N47,0454"
"2": "dnb"
"0167 ":
handle-marc-xml
keeps the leader as an own field:
See here:
---
type: "Bibliographic"
leader: "00000naa a2200000uc 4500"
"001": "1106253078"
"003": "DE-101"
"005": "20171202230117.0"
"007": "cr||||||||||||"
"008": "160712s2016 gw |||||o|||| 00||||eng "
"0167 ":
"2": "DE-101"
a: "1106253078"
"022 ":
encode-marcxml
can handle the result of decode-marc21(emitleaderaswhole="true")
but cannot if the leader is ommited in multiple fields results in leader with multiple fields.
Then re result looks like this:
<marc:record>
<marc:leader>p</marc:leader>
<marc:leader>a</marc:leader>
<marc:leader>m</marc:leader>
<marc:leader> </marc:leader>
<marc:leader>a</marc:leader>
<marc:leader> </marc:leader>
<marc:leader>c</marc:leader>
<marc:leader> </marc:leader>
It seems that there is no control if there is only one leader.
encode-marc21
cannot handle data from handle-marcxml
: see
Error is:
org.metafacture.framework.FormatException: invalid tag format for reference field
at org.metafacture.biblio.iso2709.RecordBuilder.checkValidReferenceFieldTag (RecordBuilder.java:260)
org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:244)
org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:224)
org.metafacture.biblio.marc21.Marc21Encoder.processTopLevelLiteral (Marc21Encoder.java:254)
org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:186)
org.metafacture.biblio.marc21.MarcXmlHandler.endElement (MarcXmlHandler.java:135)
Also not from decode-marc21(emitleaderaswhole="true")
see
The error is:
org.metafacture.framework.FormatException: literal must only contain a single character:leader
at org.metafacture.biblio.marc21.Marc21Encoder.processLiteralInLeader (Marc21Encoder.java:195)
org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:183)
org.metafacture.biblio.marc21.Marc21Decoder.emitLeader (Marc21Decoder.java:254)
org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:221)
org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:136)
So besides inconsistencies it is difficult to transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.
I would suggest the following changes:
- change the duplication of
leader
indecode-marc21(emitleaderaswhole="true")
, so thatleader
is not anentity
with asubfield
but there is only one elementleader
- ~~add the option
emitleaderasentity="true"
tohandle-marc-xml
so that it outputs marc as thedecode-marc21
does by default~~ @blackwinter suggested a better way:emitleaderaswhole (with default true)
this is more consistent and iffalse
leader would output likedecode-marc21
- enable
encode-marcxml
andencode-marc
so that they can handle theleader
as entity with subfields and as simple field
Just a minor observation:
add the option
emitleaderasentity="true"
Wouldn't it make more sense to use the same option emitleaderaswhole
(with default true
)?
Just a minor observation:
add the option
emitleaderasentity="true"
Wouldn't it make more sense to use the same option
emitleaderaswhole
(with defaulttrue
)?
Or like that.
@dr0i would be nice if the handle-marc-xml
-module would support the emitleaderaswhole=
option soon.
it would help to make the almaFix especially the handling of leader
-fields for the facets more readable and one would have less fuzz with variabes: https://github.com/hbz/lobid-resources/blob/4172bfef38c45e422cff14cfac56c6d81e7b8b67/src/main/resources/alma/alma.fix#L1-L11