muscat
muscat copied to clipboard
Align MarcXML IDs with Muscat resources
Currently the IDs in the MarcXML export are straight database IDs. This makes them ambiguous. For instance, there is nothing in a MarcXML record indicating whereas the record is a source or a person. Furthermore, we have different 'flavours' of MarcXML IDs depending if the record is retrieved from the SRU interface or from the BSB OPAC.
In order to disambiguate the MarcXML data, we should prefix IDS in the MarcXML export and prefix them with the resource types. E.g., sources/
for source records, people/
for person records, etc. This needs to be added on tag 001
and all subfields pointing to a Muscat authority (100$0
, 240$0
, 773$w
, 852$x
, etc.).
A source record would look like:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Exported from RISM Digital (https://rism.digital/) Date: 2023-12-06 11:58:40 UTC
-->
<collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<record>
<leader>00000ncd a2200000 u 4500</leader>
<controlfield tag="001">sources/1001093788</controlfield>
<controlfield tag="003">DE-633</controlfield>
<controlfield tag="005">20221016182338.0</controlfield>
<datafield tag="100" ind1="1" ind2=" ">
<subfield code="a">Haydn, Joseph</subfield>
<subfield code="d">1732-1809</subfield>
<subfield code="j">Ascertained</subfield>
<subfield code="0">people/55803</subfield>
</datafield>
<datafield tag="240" ind1="1" ind2="0">
<subfield code="a">Symphonies</subfield>
<subfield code="m">vl (2), vla, vlc</subfield>
<subfield code="n">Hob I:95</subfield>
<subfield code="r">c</subfield>
<subfield code="0">standard_titles/3900027</subfield>
</datafield>
<datafield tag="245" ind1="1" ind2="0">
<subfield code="a">[parts, left before accolade:] QUARTETTO | III</subfield>
</datafield>
<datafield tag="650" ind1="0" ind2="7">
<subfield code="a">Quartets (inst.)</subfield>
<subfield code="0">standard_terms/25205</subfield>
</datafield>
<datafield tag="690" ind1=" " ind2="7">
<subfield code="a">Hob</subfield>
<subfield code="n">I:95</subfield>
<subfield code="0">publications/40</subfield>
</datafield>
<datafield tag="691" ind1=" " ind2="7">
<subfield code="a">KishimotoM 1989</subfield>
<subfield code="n">no. 370</subfield>
<subfield code="0">publications/877</subfield>
<subfield code="3">51056727</subfield>
</datafield>
<datafield tag="710" ind1="2" ind2=" ">
<subfield code="a">Bibliothek der Herzöge von Braunschweig-Oels</subfield>
<subfield code="c">Oels</subfield>
<subfield code="g">Ascertained</subfield>
<subfield code="0">institutions/30009586</subfield>
<subfield code="3">111056</subfield>
<subfield code="4">fmo</subfield>
</datafield>
<datafield tag="730" ind1="0" ind2=" ">
<subfield code="a">Londoner 5</subfield>
<subfield code="g">RISM</subfield>
<subfield code="0">standard_titles/3900762</subfield>
</datafield>
<datafield tag="773" ind1="1" ind2="8">
<subfield code="a">Haydn, Joseph - 3 Symphonies - Arr; vl (2), vla, vlc; Hob I:94 </subfield>
<subfield code="w">sources/990028299</subfield>
</datafield>
<datafield tag="852" ind1="1" ind2="0">
<subfield code="a">A-Wn</subfield>
<subfield code="c">[no indication]</subfield>
<subfield code="e">Österreichische Nationalbibliothek, Musiksammlung</subfield>
<subfield code="x">institutions/30000398</subfield>
<subfield code="3">111054</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">RISM</subfield>
<subfield code="b">full</subfield>
<subfield code="c">examined</subfield>
</datafield>
</record>
</collection>
Implementation
The implementation only requires MarcNode::to_xml
to be adjusted. The only difficulty is that, since we have no MarcConfig access, we probably need some hard-coded adjustments there.
Consequences
Users of the SRU interface as well as developers of the BSB OPAC need to be notified in advance. The MarcXML exposed in the OPAC should remain unchanged and all records IDs should match.
Update: this is now in place in Muscat and is being changed in three steps:
- Preview: the prefixed IDs are available when requested explicitly.
- Deprecation: the prefixed IDs are available by default but deprecated non-prefixed IDs are available when asked explicitly.
- Removal: the prefixed IDs are not available any more.
The latest release of Muscat enabled 1). See for example the SRU interface response with a standard request, and the response for a request asking for the preview of the prefixed IDs. The second query has an additional deprecatedIds=false
parameter, which default value is currently true
. It will be flipped to false
in the deprecation phase. This means that data consumers will have to add the deprecatedIds=true
parameter explicitly until that have adjusted their system.
This also means that data consumers have currently two options:
- Adjust their system already and add a
deprecatedIds=false
parameter to their query, and they will be ready for 2) and 3). - Add a
deprecatedIds=true
parameter to their query, and they will be ready for 2), which will give them some more time for them to adjust their system before 3).
The same is planned to be apply for the data export, namely that there will be a phase 2) where both versions of the data will be made available as an export. Pinging @BernLutz for information.
May I ask if this change will, or should, affect importing MarcXML records? Should those MarcXML records include the prefix in $0, if this $0 exists?
Since the mapping with the appropriate resource is defined in the marc configuration, the MarcXML import should work as before. It would be good to test it, though.