dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Really partial references when harvesting

Open tjouneau opened this issue 3 years ago • 1 comments

What steps does it take to reproduce the issue?

  • create a harvesting client for https://repository.ortolang.fr/api/oai and select the "producer:atilf" set and the oai_dc metadata prefix.
  • launch harvest
  • see the results

What happens? Results can be seen on our test instance here : https://tested-dataverse5.univ-lorraine.fr/dataverse/ortolang. If the link does not work I attach a picture below for reference. The references are horribly partial and ugly. Most only have the title right. The date is the date of the harvesting, not the date of publication. The following request shows a very clean XML output which should not create any problem. https://repository.ortolang.fr/api/oai/?verb=ListRecords&set=producer:atilf&metadataPrefix=oai_dc I'm attaching said output here : repository.ortolang.fr.xml.zip

Which version of Dataverse are you using? 5.10

Any related open or closed issues to this bug report?

Screenshots: image

tjouneau avatar Apr 21 '22 12:04 tjouneau

Hello everyone, we are working with @tjouneau and the recent in-depth analyses on v6.2+ have allowed us to put our finger on the explanations of why the rendering was questionable and the harvesting partial.

  • First, there are many issues with controlled vocabularies and this can be circumvented by activating the allowHarvestingMissingCVV parameter.

  • Second, there are data quality anomalies related to the Ortolang data repository, such as the lack of title. Also, the non-management of the dc:contributor tags and the date that does not correspond to dc:date but to dc:datestamp in the headers.

A general work will be done for https://entrepot.recherche.data.gouv.fr/ harvesting and we will contribute all possible improvements to dataverse, in particular the use of the oai_dc and oai_ddi metadata formats. (Already started with #10772 #10837)

@tjouneau I suggest you close this ticket, we will open specific tickets if necessary.

luddaniel avatar Oct 16 '24 12:10 luddaniel

@luddaniel thanks! We just merged your PR:

  • #10772

@tcoupin how do you feel about the suggestion to close this issue and follow along with what @luddaniel is doing?

pdurbin avatar Nov 05 '24 19:11 pdurbin

Hi @pdurbin,

I'm answering for @tcoupin: we've not met this issue so we don't have a strong opinion on it. Did you mean @tjouneau?

plecor avatar Nov 06 '24 07:11 plecor

Whoops! Sorry for the confusion! Too many "t" names. 😅

@tjouneau I see you closed this issue. Thanks.

@luddaniel thanks for continuing to fix all the things! ❤️

pdurbin avatar Nov 06 '24 15:11 pdurbin