bibframe-ontology icon indicating copy to clipboard operation
bibframe-ontology copied to clipboard

Primary Publication Activity

Open kirkhess opened this issue 3 years ago • 8 comments

The current implementation of Provision Activities causes problems with MARC conversion to/from the 008 and the 26x. The current strategy is to create multiple provision activities - one with a country and a year from the, and others with an agent, place and a date for each 26x with a separate property for a copyright date from a 26x(or 008). Converting BF to MARC, you have to kind of guess which is the 008 one and which ones are the 26x converting back to MARC.

Also, when creating BF in an editor, catalogers do not generally create both types of provision activities and both Sinopia and LC do not offer catalogers a country code option in the profile so no code is available in the 008/15-17. In records like this the country code defaults to "xx ". Example: BF created by a cataloger converted to MARC (LCCN 2021035601)

<controlfield tag="008">210722s2022 xx j b 00| 0 eng </marc:controlfield>

vs. the correct form of the 008 (LCCN 2021035601)

<controlfield tag="008">210722s2022 cau j b 001 0 eng </controlfield>

Like contributions, I'm proposing adding PrimaryPublication as a subclass of ProvisionActivity. The primary activity would have a structure like this:

bf:provisionActivity [ 
    a :PrimaryPublication ;
    kbv:year "008 date(s)" <hopefully valid edtf>)
    bf:agent [
      a bf:Agent ;
        rdf:label 'Agent Label'
      ] ;
    bf:place [
      a bf:Place ;
       rdf:label 'place Label'
      ] ;
    kbv:country <https://id.loc.gov/vocabulary/countries/{code}> or 
      [ a madsrdf:Country ; 
        bf:code 'countrycode'
      ]
     bf:date 'possible 26x date which isn't a year or doesn't match the 008 year' 
     bf:note [...008/6 explanations...]
    ] .

This is similar to how LIBRIS solved this issue - see kbv: -> https://id.kb.se/vocab/ where they created year and country. The 008 and the first 260/264 would merge into the primary ProvisionActivity which would flow out to both the 26x and 008.

Agents, places and countries could be resources with a URI or blank nodes.

BF: https://libris.kb.se/l3wrqs7x01688vl/data.ttl

Manifestation example: https://libris.kb.se/bib/7421698?vw=full&tab3=marc

kirkhess avatar Jan 31 '22 21:01 kirkhess

It will be of little surprise that we’re struggling with this too and have basically hit the same issue.

There’s a case we have, and I’m sure OCLC will too, where the 008 dates are not ‘publication’ dates.

We treat ‘collection’ dates as bf:Production (008/06=i or k), and that may be a factor also, but the dates I am thinking of are the ones we encounter when 008/06=p: “Date of distribution/release/issue and production/recording session when different.” These are used a lot for moving images, for example. We generate two ProvisionAcivity resources – Distribution and Production – from 008/06=p.

And then we come to the 260: 260$a and $b are ambiguous so the resulting ProvisionActivity is a bf:Publication type. MARC records with 008/06=p and one 260 will produce 3 ProvisionActivity resources. (If $e, $f, and $g are used, its bf:Manufacture naturally.)

I see where you are going, but how would you all handle the above scenario? Based on your data, would it be safe (or is that safe-ish?) to assume the first 260 should instead be bf:Distribution when 008/06=p? In other words, the type of the first 260 (assuming $a and $b are used) is driven by the 008/06?

kefo avatar Feb 03 '22 21:02 kefo

An addendum to my previous comment:

I don’t know if it is clear how my bringing in those other aspects bears on your question so I thought I would try to tie the threads together if the above seems unconnected to your initial post.

You are proposing “PrimaryPublication” in order to identify which of potentially multiple ProvisionActivity resources should be used to derive, principally, the correct 008 information while reducing the number of ProvisionActivity resources. In most cases, there will only be one ProvisionActivity resource as a result.

My reason for bringing in those other 008 dates – and apologies if it was clear; I just feel it was less than – is to note that not all 008 dates relate to “Publication.” In one case the 008 dates will generate two ProvisionActivity resources, with the principle (as in first) being for Distribution. The inference is that “PrimaryPublication” is not suitable for Distribution dates (or Production dates), at which point we either have a badly typed ProvisionActivity resource or this suggestion is not accounting for all the possibilities. I don’t think I said that as plainly above. My question earlier is really asking what you all are thinking when the ProvisionActivity resource that is derivable from the 008 does not represent ‘Publication’ but a different activity. The response might simply be, “OK, how about bf:PrimaryProvisionActivity or just bf:Primary,” but you all might have contemplated other ways around this problem, hence the question.

Background detail that might bear on our perspective of this problem: we are hoping to return all ProvisionActivity resources to the 264 field, even if the information originally came from the 260. This means having an accurate ProvisionActivity type is important so we can set the proper indicator value in the 264.

kefo avatar Feb 03 '22 23:02 kefo

Thanks Kevin for your ideas. With Multiple 264 it might better to talk about a specific OCN/LCCNs so we can riff through what happens. The vast majority of resources this would work w/o a lot of effort since they have a single 26x and it would merge into the 008 as a single activity.

The name of the class isn't that important and I assume it would work for every kind of ProvisionActivity. LIBRIS called it PrimaryPublication, prob. because for almost every resource the 008 is a publication activity? PrimaryProvisionActivity is inline with PrimaryContribution. However, it may only make sense for publications as you noted with your scenario. Also, a cataloger could just transcribe the Agent/Place/Date and the system could generate the _:year and _:country.

I may miss something in your scenario, but we have a PrimaryPublication (PrimaryProvisionActivity?), and it has a specific _:year with a "k" edtf date which is two dates (date1/date2) or a single "p" date (date1) and the rdf:type bf:Production and/or bf:Distribution and a country code along with Agent, Place, dates, and notes merged in from the first 264 which I'm guessing is the bf:production one.

I think there's enough information there to put it in the 008 but the converter would consider this an 'i'/'s' date not 'k'/'p'. With 15 DtSt codes maybe those needs to creep in here as bf:code to be accurate. The current bibframe2marc converter only supports ~6 of the codes.

Let me know if you have more questions & let me know if you want me to sketch out some of those examples.

kirkhess avatar Feb 04 '22 12:02 kirkhess

Here's one: http://www.worldcat.org/oclc/957611959

008 160825p20071950nyu002        o   vleng d
264  1 $a [Place of publication not identified] : $b WPA Film Library, $c [1950]
264  4 $c ©1950
264 32 $a New York, N.Y. : $b Distributed by Films Media Group, $c 2007.

Turns into:

        bf:copyrightDate            "©1950" ;
        bf:provisionActivity        [ rdf:type   bf:Distribution , bf:ProvisionActivity ;
                                      bf:agent   [ rdf:type    bf:Agent ;
                                                   rdfs:label  "Distributed by Films Media Group"
                                                 ] ;
                                      bf:date    "2007" ;
                                      bf:place   [ rdf:type    bf:Place ;
                                                   rdfs:label  "New York, N.Y."
                                                 ] ;
                                      bf:status  [ rdf:type    bf:Status ;
                                                   rdfs:label  "current"
                                                 ]
                                    ] ;
        bf:provisionActivity        [ rdf:type  bf:Production , bf:ProvisionActivity ;
                                      bf:date   "1950"^^<http://id.loc.gov/datatypes/edtf>
                                    ] ;
        bf:provisionActivity        [ rdf:type  bf:Publication , bf:ProvisionActivity ;
                                      bf:agent  [ rdf:type    bf:Agent ;
                                                  rdfs:label  "WPA Film Library"
                                                ] ;
                                      bf:date   "[1950]" ;
                                      bf:place  [ rdf:type    bf:Place ;
                                                  rdfs:label  "[Place of publication not identified]"
                                                ]
                                    ] ;
        bf:provisionActivity        [ rdf:type  bf:Distribution , bf:ProvisionActivity ;
                                      bf:date   "2007"^^<http://id.loc.gov/datatypes/edtf> ;
                                      bf:place  <http://id.loc.gov/vocabulary/countries/nyu>
                                    ] ;

Something like this is what I was proposing:

        bf:copyrightDate            "©1950" ;
        bf:provisionActivity        [ rdf:type  bf:Distribution,  _:PrimaryProvisionActivity ;
                                      kbv:year "2007"^^<http://id.loc.gov/datatypes/edtf> ;
                                      kbv:otherYear   "1950"^^<http://id.loc.gov/datatypes/edtf>;
                                      kbv:country <http://id.loc.gov/vocabulary/countries/nyu>;
                                      bf:code "p";
                                      bf:agent   [ rdf:type    bf:Agent ;
                                                   rdfs:label  "Distributed by Films Media Group"
                                                 ] ;
                                      bf:date    "2007" ;
                                      bf:place   [ rdf:type    bf:Place ;
                                                   rdfs:label  "New York, N.Y."
                                                 ] ;
                                      bf:status  [ rdf:type    bf:Status ;
                                                   rdfs:label  "current"
                                                 ]
                                    ] ;
        bf:provisionActivity        [ rdf:type  bf:Publication , bf:ProvisionActivity ;
                                      bf:agent  [ rdf:type    bf:Agent ;
                                                  rdfs:label  "WPA Film Library"
                                                ] ;
                                      bf:date   "[1950]" ;
                                      bf:place  [ rdf:type    bf:Place ;
                                                  rdfs:label  "[Place of publication not identified]"
                                                ]
                                    ] ;

The new method would merge the 008 stuff to the 'latest' from the first indicator, or simply the only 26x in most cases.

Primary implies current so maybe that bnode could go away.

distributionYear and productionYear might be more clear in the case of 008/06 'p' and would get rid of the bf:code.

kirkhess avatar Feb 04 '22 17:02 kirkhess

@kirkhess -- we took a different approach to creating a primary publication activity last year. The publication data from the 008 field is assigned to the existing bf:Place, etc. but the literals from MARC field 26X are assigned to new bflc literals -- simpleAgent, simplePlace and simpleDate. Do you think this approach would work for you too?

This part of the conversion is most thoroughly described in Process 8 of the Process0-8 document.

jodiw01 avatar Nov 29 '23 20:11 jodiw01

Thanks for answering - it does work better with but there are still some outstanding issues which I listed below. The almost 2 year gap in your answer (or one year since it was published in 2.3 and Kevin told me about it first) is where an Advisory Committee would probably be the best way forward instead of Github since the documentation would have both a predictable publication pattern and would have regular meetings to go over issues like this one.

  • The 008/06 DtSt byte isn't supported by BF - I put 'bf:code' up above in my example as crude way to keep the byte but maybe you want to mint a new property like simpleDtSt? bfmarc:008DtSt? The rules can guess the common ones, but not the complex/rare ones. Do you even know the date types you don't support & do you want me to send you a list? ? Is this also an artifact of LC Policy decisions - you don't use 008/06 'b' so you don't care?
  • The 26x ind1 value is critical for Serials/CONSER cataloging but is not supported by BF.
  • The 26x order is important and this is not ordered. We will create a new issue for order and ind1 support.
  • In the past year, without a clear definition of the shape in the ontology, other implementations (e.g ShareVDE) do not use simpleAgent/Place/Date, possibly due to putting these in the bflc namespace.

This is one of the better examples of lossiness in BIBFRAME, as well as duplication in MARC.

kirkhess avatar Nov 30 '23 12:11 kirkhess

The 008/06 DtSt byte isn't supported by BF - I put 'bf:code' up above in my example as crude way to keep the byte but maybe you want to mint a new property like simpleDtSt? bfmarc:008DtSt? The rules can guess the common ones, but not the complex/rare ones. Do you even know the date types you don't support & do you want me to send you a list? ? Is this also an artifact of LC Policy decisions - you don't use 008/06 'b' so you don't care?

There's probably some of that - it being an artifact of LC policy and non-use. I'm not sure your 'code' is such a crude way of solving this issue on your side. It seems to work and 'code' is pretty accurate since that is what it is. Can you send us examples of the complex/rare ones?

The 26x ind1 value is critical for Serials/CONSER cataloging but is not supported by BF.

This is partly a bug in the conversion. At least ind1=3 should be handled according to the specs; that's a conversion oversight. But we should look into ind1=2.

The 26x order is important and this is not ordered. We will create a new issue for order and ind1 support.

Ordering is going to be a massive issue and question. My first thought is "there is a case to be made for splitting on multiple 264s," though I can hear the howls now because that would wreck WEM lock. My second thought is "what is the ordering based on?" If it is the date, then people should enter EDTF dates as the date in the ProvAct and we can just sort on that. But open the issue; it's a bigger conversation, maybe.

In the past year, without a clear definition of the shape in the ontology, other implementations (e.g ShareVDE) do not use simpleAgent/Place/Date, possibly due to putting these in the bflc namespace.

This is really a question for them. Could also be an advertising problem.

kefo avatar Nov 30 '23 15:11 kefo

Hi - I followed up with an ind1/order issue.

I included links to the current instructions published by LC. In that case, I think PTCP could just publish BF specific documentation and/or give NetDev some guidance? Or bring it to the PCC and have the appropriate committee (policy? CONSER?) look into it?

It isn't clear in MARC that 260/264 are in some kind of order which doesn't help. Similarly, I'm not certain ind1 is really that useful to systems but I think the community would probably prefer we can round trip it.

kirkhess avatar Dec 04 '23 19:12 kirkhess