dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

fix text spacing in astronomy metadata fields

Open sbondka opened this issue 2 years ago • 8 comments

Which issue(s) this PR closes:

Closes #2622

Additional documentation: For documentation of the release note, we propose to list this issue in the milestones of the next relaese.md for the upcoming version.

sbondka avatar Jan 04 '24 16:01 sbondka

@pdurbin it seems this PR has been forgotten since its review, could you add it to the IQSS board when you come back 🍀 ? :)

DS-INRAE avatar Jun 18 '24 16:06 DS-INRAE

Coverage Status

coverage: 20.661% (+0.002%) from 20.659% when pulling 8c72ae3a39be3f072e7e5191915beb1a9709a5e8 on Recherche-Data-Gouv:2622_Fix_Text_Spacing_in_Astronomy_Metadata_Fields into 00020e2e14be599d49bbd7400d37454cf91717b7 on IQSS:develop.

coveralls avatar Jun 19 '24 16:06 coveralls

I'm not sure this pull request addresses the issue correctly.

From #2622, I understand that the problem comes from the display in Dataverse, not specifically from the metadata value (corresponding to the TSV file). In addition, the Virtual Observatory (VO) Discovery and Provenance Metadata document indicates controlled vocabulary as shown in the TSV file.

But I also notice that all the other controlled vocabularies in the other TSV files don't use CamelCase.

@qqmyers @pdurbin do you think this proposal is correct, or should I make a new pull request, modifying only the bundle resources?

jeromeroucou avatar Jun 20 '24 08:06 jeromeroucou

Sorry - I didn't initially catch that these terms were from VO. It's an interesting question. Right now the astronomy block doesn't map its terms to any external vocabulary (via a blockURI or termURIs - see the last column of citation.tsv entries), so any machine integration has to be custom. I agree that changing the .properties entry only would be less likely to break such a custom integration. (That said, it appears that some exporters use the .properties value for some fields (e.g. subject for ddi and schema.org) now and I'm not sure what is sent in the APIs.

In general, this appears to be more of an astronomy community question than a technical design choice, so we should probably defer to what @jggautier (anything in the guidance doc?) and anyone using Dataverse for astronomy thinks (what do other repos display?).

Knowing that this maps to VO now, I'd add one other potential option: leave the spelling as is and make it clearer in the term title or description that the values come from VO, e.g. make astroType be "VO Type" and/or have a description that says "The type of data as defined in the International Virtual Observatory Alliance’s (IVOA) VOResource Schema format." This would be even less of a change than editing the .properties w.r.t. any integrations. (Guessing that this is a better referenced based on the text in the Guides. I'm guessing that astronomy researchers might expect the terms without spaces if that's how they are defined elsewhere.)

With any solution, I'd definitely encourage adding a blockUri or termUris - that would map the terms to ones machines would recognize in the OAI-ORE export and any external exporters that might pick it up from there (there are RO-Crate discussions about whether to use these mappings or provide another layer of mapping).

qqmyers avatar Jun 20 '24 13:06 qqmyers

I've always thought that https://github.com/IQSS/dataverse/issues/2622 was only about how the values are displayed in the UI and adding identifiers in the identifier column of the metadata block's TSV file.

I'm guessing that astronomy researchers might expect the terms without spaces if that's how they are defined elsewhere.

I would be surprised if astronomy researchers expected these terms to not have spaces, and I imagine @posixeleni would be surprised, too. And I'd be surprised if Arnold Rots and others who worked on the document at https://perma.cc/H5ZJ-4KKY also expected that when researchers and curators see their list of "VO Types" in a UI, they're displayed without spaces.

@qqmyers, by "guidance doc", is that the metadata text guidelines at https://docs.google.com/document/d/1tY5t3gjrIgAGoRxVMWQSCh46fnbSmnFDLQ7aLkNLhJ8? There isn't anything about this there.

I think it would be easy to ask our contacts from Harvard's CfA about this, and to try to reach out to Arnold Rots or someone who worked on that Discovery and Provenance Metadata for Persistent Data Objects in the Virtual Observatory document.

We could re-write https://github.com/IQSS/dataverse/issues/2622 as more of an investigation or spike issue. Or we could open a new investigation or spike issue.

What do you all think?

jggautier avatar Jun 20 '24 20:06 jggautier

It would be nice to have a standard approach in that guidance doc - we basically have the same issue in #10632 where the DataCite relationTypes are terms like IsCitedBy and we definitely want to keep that form in metadata exports since we are matching the DataCite schema in those. If terms like this should be displayed/translated as separate words (and perhaps only the first capitalized? - DataCite shows "Is cited by" in the Fabrica interface), then @jeromeroucou's suggestion to just edit the properties files would be better than moving this PR forward. (Though adding identifiers and perhaps blockUri or termUris would be good too :-) )

qqmyers avatar Jun 20 '24 22:06 qqmyers

It makes sense to me to add guidance in the guidance doc about the display of options in UI components like drop down menus.

I think it's safe to say that in UI components like drop down menus, we should capitalize the first letter of each option and use a space between each word so that each option is more human readable. So I could add that to that guidance doc. I can't think of any cases where it would be better if any or all of the options in a UI component like a drop down menu were in CamelCase.

About editing only the properties file, I've always thought that the CV terms listed in a metadata block's TSV files should match what's in that metadata block's properties file, even though Dataverse uses what's in the properties file. I've tried to make sure they match to avoid confusing myself and others about what should be displayed. Could changes be made to both the TSV and properties files?

I think the GitHub issue at https://github.com/IQSS/dataverse/issues/2622 mentioned adding identifiers in the controlled vocabulary because folks thought it would be relatively easy, and so @pdurbin added the "low hanging fruit" label to the GitHub issue. I think adding a blockUri or termUris could be done as part of another effort, with its own GitHub issue.

Or we should edit https://github.com/IQSS/dataverse/issues/2622 to expand its scope.

I think there are differences between this PR about the terms in the astronomy metadata block and what https://github.com/IQSS/dataverse/pull/10632 is about.

jggautier avatar Jun 24 '24 16:06 jggautier

Wow, there's a lot more complexity here than I realized. I went ahead and removed the "low hanging fruit" label from the issue this PR is trying to close.

I'm happy to defer to others who have thought more deeply about the best solution.

pdurbin avatar Jul 08 '24 15:07 pdurbin

I was asked to propose next steps based on discussion in this pull request.

I think that we should work with experts, like contacts at Harvard's CfA and folks who worked on that VO metadata standard, to figure out how the vocab terms should appear when researchers choose them, and to figure out how identifiers, a blockUri and/or termUris should be added to the metadata block.

@scolapasta suggested closing the PR, and that we could reopen it if we decide to build on it. I agreed so I'm closing this PR.

We'll meet with @cmbz next week to review this.

jggautier avatar Nov 14 '24 16:11 jggautier