dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Add Harvesting Source to search facets

Open DS-INRAE opened this issue 1 year ago • 3 comments

Overview of the Feature Request Following the addition of a source name for harvesting clients :

  • #10217 This information should be useable as a search facet to filter results.

What kind of user is the feature intended for? (Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) API User, Guests

What inspired the request? Needed for our harvested repositories.

What existing behavior do you want changed? Modify the current search facet "Metadata Source" to include the list of Sources from harvesting clients.

Any open or closed issues related to this feature request?

  • #10195 (macro issue for centralization)
  • #10217

DS-INRAE avatar Feb 06 '24 10:02 DS-INRAE

I remember suggesting this back when we added that facet.

@DS-INRA are you thinking this would be a system-wide setting? And we'd keep the default as-is but installations could opt-in to it? Some installations might like the current behavior.

pdurbin avatar Feb 06 '24 14:02 pdurbin

@DS-INRA are you thinking this would be a system-wide setting? And we'd keep the default as-is but installations could opt-in to it? Some installations might like the current behavior.

Good question, I'll post the question on the group with mockups to see the other installations opinion. An other thought is that maybe simply not indicating a source name for any of the clients would be the easier solution for installations not wanting to dissociate sources, I don't know if it would work with the facet mechanism.

DS-INRAE avatar Feb 06 '24 15:02 DS-INRAE

It could be interesting to include a feature that displays the data sources in search facets.

@DS-INRA, what kind of source information are you considering for the harvesting client to display? Would it be the server URL, nickname, or Dataverse?

gwendoux avatar Feb 08 '24 16:02 gwendoux

Hi @DS-INRA 👋🏼, Quick question: What would you expect to see on the Metadata Source? As of now I have this where you see the name that was given to the Harvesting Client

As an example here are my Clients:

image

And here is how it looks:

image

A couple of things come to my mind:

  • The Harvesting Client name can't have spaces so it may not look great for clients with more than 1 word.
  • If we use the name of the original Dataverse we can encounter issues where the same name is used on different sources and would be grouped by the same name.
  • Would it still make sense to include the root between all the other sources?

jp-tosca avatar Apr 03 '24 18:04 jp-tosca

Hello, Sorry for not detailing this before, I'm lagging behing the issues descriptions and it stayed with the details split between the two issues and not extensive.

What would you expect to see on the Metadata Source ?

We want to see is the harvested repository's name.

If we use the name of the original Dataverse we can encounter issues where the same name is used on different sources and would be grouped by the same name.

The case where two OAI sets from the same repository would get the same "Source/Repository Name" name is actually as expected. For example, for a repository with two OAI sets (and therefore 2 clients), e.g. with one set from institution A going in collection A' and one set from institution B going in collection B', we would still want the same "Source/Repository Name".

Would it still make sense to include the root between all the other sources?

Yes I think, if you specifically want datasets from the current repository, and for quick counting purposes for dataverse collection admins.

DS-INRAE avatar Apr 08 '24 15:04 DS-INRAE

@DS-INRA Hi, I suggested an alternative implementation in the PR earlier today, specifically, rather than using eithter the nickname of the Harvesting Client, or the descriptive label for the remote repository (still to be added, per #10217), just use the name of the local collection into which the client is harvesting. My comment there: https://github.com/IQSS/dataverse/pull/10464#discussion_r1556077825

The potential advantage of this solution: makes #10217 unnecessary, while still providing a descriptive, user-friendly facet label. A disadvantage: it's not going to cover the scenario you just described - multiple harvesting clients harvesting different sets from the same archive, into different local collections, that the local admin may want to group under the same facet.

I'm generally happy to implement it the way you prefer, just figured I'd ask.

landreev avatar Apr 08 '24 18:04 landreev

Hi @landreev , unfortunately there are two additional limits to this approach even if it would have been great to avoid adding a new field :)

  • when a Harvested repository matches 1:1 a target collection, the repository name does not necessarily matches the collection name
  • a Dataverse Collection is usually in our case the target of more than one harvesting clients/sets, coming from different repositories

DS-INRAE avatar Apr 09 '24 08:04 DS-INRAE

@DS-INRA Sure. So, just to confirm, our plan then is to merge the linked PR #10464 as is, with the client nickname used for the facet (for now). Then, when the descriptive label is added, we'll switch to using that - ?

landreev avatar Apr 09 '24 12:04 landreev

I'm okay with this approach, as discussed with @jp-tosca (thanks for the short summary :) )

DS-INRAE avatar Apr 10 '24 15:04 DS-INRAE