openMINDS_core icon indicating copy to clipboard operation
openMINDS_core copied to clipboard

reassessment of actors, their affiliations, and relations to research product (versions)

Open lzehl opened this issue 1 year ago • 22 comments

This issue is based on issues https://github.com/openMetadataInitiative/openMINDS_core/issues/502, https://github.com/openMetadataInitiative/openMINDS_core/issues/500 with additional aspects.

research products (RPs); research product versions (RPVs) category legalPerson: Person, Organisation, Consortium

@openMetadataInitiative/openminds-developers please review and discuss (terminology is still up for change, please provide better suggestions if you have them)


Current status (as reminder)

Current status for linkages between RP/PRV and actors:

  • author (1-N) linked category: legalPerson (for some RP/RPV, e.g. dataset (version))
  • custodian (1/0-N) linked category: legalPerson (for any RP/RPV; required only in dataset (version))
  • developer (1-N) linked category: legalPerson (for some RP/RPV, e.g. software (version))
  • otherContribution (0-N) linked type: Contribution (for any RPV)

Current Contribution:

  • contributor (1) linked category: legalPerson
  • type (1-N) linked type: ContributionType (controlled, e.g. coordination, data processing, etc.)

Current linkage to Affiliation:

  • affiliation (0-N) linked type: Affiliation (for Person or Organisation)

Current Affiliation:

  • endDate (0-1) string (date)
  • memberOf (1) linked type: Consortium or Organisation
  • startDate (0-1) string (date)

Current Organisation:

  • affiliation (0-N) linked type: Affiliation
  • digitalIdentifier (0-N) linked type: GRIDID, RORID, RRID
  • fullName (1) string
  • hasParent (0-N) linked type: Organisation
  • homepage (0-1) string (iri)
  • shortName (0-1) string

Current Consortium:

  • fullName (1) string
  • contactInformation (0-1) linked type ContactInformation (email)
  • homepage (0-1) string (iri)
  • shortName (0-1) string

SUGGESTIONS:

  1. get rid of role specific actor linkages
  2. make affiliation for a specific RP/RPV explicit
  3. enforce real Organization (with RORID); decouple department, institute, etc. info
  4. review Consortium (ID missing? affiliation missing?)
  5. review Person (full name?)

RP/RPV

  • replace "author", "developer", "otherContribution" with "contributor"
  • contributor (1-N) embedded type Contributor

New Contributor:

  • affiliation (1-N) linked type: Organization
  • contribution (1-N) embedded type: Contribution
  • legalPerson (1) linked category: legalPerson

New Contribution:

  • role (1) linked type: ContributorType (or stick with ContributionType ?)
  • order (0-1) number (will require conditions to be checked)

New Organization:

  • address (1) embedded type: Location
  • acronym (0-1) string
  • alternateName (0-N) string
  • digitalIdentifier (0-N) linked type: GRIDID, RORID, RRID, ISNI, LEI
  • name (1) string
  • hasParent (0-N) linked type: Organization
  • homepage (0-1) string (iri)
  • memberOf (0-N) embedded type: Membership ?
  • type (1) linked type: LegalEntityType -> (new controlled terms) (Question: what are other types beyond legal entity type? e.g. patient associations?)

New Person:

  • alternateName (0-N) string
  • associatedAccount (0-N) linked type: AccountInformation
  • contactInformation (1-N) linked type: ContactInformation
  • digitalIdentifier (0-N) linked type: ORCID (others??)
  • familyName (0-1) string
  • fullName (1) string
  • givenName (1) string
  • memberOf (0-N) embedded type: Membership ?

New Membership:

  • endDate (0-1) string (date)
  • memberOf (1) linked type: Consortium or Organization
  • startDate (0-1) string (date)

New ContactInformation

  • address (0-1) embedded type: Location
  • email (1) string

New Location:

  • administrativeDivision (0-1)
  • city (1)
  • country (1)
  • geoCoordinates (0-1) embedded type: GeoCoordinates
  • postalAddress (0-1) embedded type: POBox | StreetAddress
  • postalCode (1)

New GeoCoordinates

  • elevation (0-1)
  • latitude (1)
  • longitude (1)

New POBox

  • number (1)

New StreetAddress

  • street (1)
  • number (0-1)

New ISNI

  • identifier (regex: ^https?://(www.)?isni.org/(isni/)?\d{15}[\dX]$)

New LEI

  • identifier (regex: ^https://lei.global/LEI/[A-Z0-9]{18}\d{2}$)

lzehl avatar Sep 26 '24 21:09 lzehl

@openMetadataInitiative/openminds-developers this needs to be discussed for v5.0

lzehl avatar Apr 01 '25 12:04 lzehl

New Contribution:

  • role (1) linked type: ContributorType (or stick with ContributionType ?)

Roles like curator, developer, designer, writer, or data provider are straightforward. However for nuanced tasks such as data collection, optimization, data interpretation, or survey design, aren’t as easily categorized. In these cases, simply using "role" might not fully capture the contribution, and providing a detailed description (e.g., "task" or "taskDescription") could offer better clarity.

Affiliation Property (Contributor): Some organizations require researchers/employees to be cited in a specific order when they have multiple affiliations. Including an order property could address this need.

New Organization: homepage (0-1): Change type from string (iri) to (url).n

Raphael-Gazzotti avatar Apr 02 '25 06:04 Raphael-Gazzotti

@Raphael-Gazzotti

ContributorType: fully correct, we may want to go deeper. I checked again and I would like to follow the logic of DataCite (so ContributorType), but then as you suggest refine with the use cases.

Affiliation: you are fully correct. This would mean to separate an Affiliation schema with order similar to Contribution schema above.

Organization: we should discuss with the others. There was a reason why we decided for IRI for all web addresses (but I don't remember why).

Some other things @Raphael-Gazzotti and I discussed in person:

Alternative option to the restructuring above is organizing according to contribution role with ordered lists of persons. (Contribution schema, with role, contributors (ordered list of linked types) -> Contributor schema person, affiliation).

Questions around address: is it sufficient for international addresses? Is the lower administrative division (district, block, etc) needed besides the higher administrative division (state, province, region)?

Name(s) of organization: For legal registration the name is normally in the native language of the country the organization is located in. However for our purposes the English name would be more beneficial. We should register at least the name in native language as alternate name. Acronym however typically refers to the native name. Other alternate names are other abbreviations or transliterated names (native name without special characters or English for non roman letter languages). What do we want to capture?

LegalEntityType: are new controlled terms, with main name provided in English and then in synonyms in other languages (with Language code embedded in the string "XXX (EN)" or as "XXX @EN"? or as structured string embedded type: String:text"XXX",language"EN"?)

Organization connections: we have hasParent and memberOf, however there are also other relations between organizations (e.g. between a university and a university hospital).

Organization logos: do we want that?? linked type (File | WebResource) OR just iri to WikiData/WikiMedia?

Person: we may want to still have a currentAffiliation directly attached to display on the KG for persons. or could this be merged with memberOf / isPartOf? (allowing to state there any relation of a person to something consortium/organization?)

lzehl avatar Apr 02 '25 09:04 lzehl

Organisation needs to have shortName or acronym field, because full names are sometimes very long.

Example: "The College of the Holy and Undivided Trinity of Queen Elizabeth near Dublin", usually known as "Trinity College Dublin" (ROR) in English ("Coláiste na Tríonóide, Bhaile Átha Cliath" in Irish).

apdavison avatar Apr 07 '25 13:04 apdavison

Contact information should refer to just the current, preferred methods of contact.

apdavison avatar Apr 07 '25 13:04 apdavison

For scientific purposes, I think Location can be simplified to country (required), city (optional), administrativeDivision (optional).

apdavison avatar Apr 07 '25 13:04 apdavison

Alternative approach for the upper part

RP/RPV

replace "author", "developer", "otherContribution" with "contribution"
contribution (1-N) embedded type Contribution

New Contribution:

contributor (1-N) embedded type: Contributor (ordered list)
role (1) linked type: ContributionType

New Contributor:

legalPerson (1) linked category: legal person 
affiliation (1-N) linked type: Organization (ordered list)

lzehl avatar Apr 07 '25 13:04 lzehl

Yet another alternative approach for the upper part: flatten Contribution/Contributor:

RP/RPV

  • replace "author", "developer", "otherContribution" with "contribution"
  • contribution (1-N) embedded type Contribution

New Contribution:

  • contributor (1) linked category: legal person
  • role (1) linked type: ContributionType
  • affiliation (1-N) linked type: Organization (ordered list)

(as above, the affiliations are understood to be those that applied at the time of the contribution).

apdavison avatar Apr 07 '25 13:04 apdavison

Picking up the discussion again for this issue and pushing towards a final decision here. Summary of the options (focusing on the essentials):

OPTION A

RP/RPV has contributor (1-N) embedded type Contributor
Contributor has affiliation (1-N) linked type Organization (ordered list!!!)
Contributor has contribution (1-N) embedded type Contribution
Contributor has legalPerson (1) linked category legalPerson
Contribution has role (1) linked type ContributionType
Contribution has order (0-1) number

Main disadvantage of Option A is a mismatch between order ranking and the actual number of contributors in respect to their contribution.

OPTION B

RP/RPV has contribution (1-N) embedded type Contribution
Contribution has contributor (1-N) embedded type Contributor (ordered list!!!)
Contribution has role (1) linked type ContributionType
Contributor has legalPerson (1) linked category: legal person 
Contributor has affiliation (1-N) linked type: Organization (ordered list!!!)

Main disadvantage is the repetition of persons with their affiliations for all their contribution roles.

OPTION C

RP/RPV has contribution (1-N) embedded type Contribution
Contribution has contributor (1) linked category: legal person
Contribution has role (1) linked type: ContributionType
Contribution has affiliation (1-N) linked type: Organization (ordered list!!!)

Main disadvantage of Option C is that it does not allow for ordering of contributors in respect to their contributions (order of authors might be different to order of custodians).

@openMetadataInitiative/openminds-developers what do you see as the most feasible ? @olinux @annapaola which one is the best from the KG perspective?

lzehl avatar Jun 17 '25 13:06 lzehl

On reviewing the current options, I think they are all too complex. For any research product, we only need one ordered list of people (typically corresponding to the author role; for "custodian", we can just have a list of Persons, we don't need to use Contributor). If this is accepted, we can simplify things:

Option D

RP/RPV has contributor (1-N) embedded type Contributor (ordered list)
Contributor has affiliation (1-N) linked type Organization (ordered list)
Contributor has legalPerson (1) linked category legalPerson
Contributor has role (1-N) linked type ContributionType

apdavison avatar Sep 20 '25 21:09 apdavison

I think discussion of how to represent locations should be moved to a separate ticket.

apdavison avatar Sep 20 '25 21:09 apdavison

I don't like option D because it defeats the purpose of merging all possible contributions roles into one property instead of listing them separately. If we separate again custodians from this it kind of goes against this logic (and also custodians can change affiliations so also their role kind of has a temporary aspect which get's lost if we only provide a list of persons). The only option to go with option D is to ignore that there might be different orders for different roles.

What I like about option B is the following: Systems like EBRAINS can setup their own rules for certain contribution types. I'll try to give an example:

Reminder for slightly modified Option B (because Organizations and Consortia do not need affiliations as Contributor, or am I mistaken?)

RP/RPV has contribution (1-N) embedded type Contribution
Contribution has contributor (1-N) embedded type Contributor (ordered list!!!)
Contribution has role (1) linked type ContributionType
Contributor has contributor (1) linked category: legal person (Person|Organization|Consortium)
Contributor has affiliation (0-N) linked type: Organization (ordered list!!!)

An RP/RPV has three contributions embedded:

Contribution1: type: "provider"; contributor: [Contributor1, Contributor2]
Contribution2: type: "authors"; contributor: [Contributor3, Contributor4, Contributor5]
Contribution3: type: "custodian"; contributor: [contributor5, contributor3]
Contribution4: type: "curator"; contributor: [contributor6, contributor7]
Contributor1: contributor: Organization1; affiliation: none
Contributor2: contributor: Organization2; affiliation: none
Contributor3: contributor: Person1; affiliation: [Organization1]
Contributor4: contributor: Consortium1; affiliation: none
Contributor5: contributor: Person2; affiliation: [Organization1, Organization2]
Contributor6: contributor: Person3; affiliation: [Organization3]
Contributor7: contributor: Person4; affiliation: [Organization4]

EBRAINS can set up rules like:

  • contributors for contributions of type provider should be organizations (legal entities).
  • contributors for contributions of type custodian should be persons only.
  • contributors for contributions of type author should be persons or consortia.
  • contributors for contributions of type curator should be persons only.

Another system might set up different rules for the different contribution types. Lists would be always assumed ordered but a system could set up alphabetic rules as well on top if wanted for any of the contribution types.

@openMetadataInitiative/openminds-developers let's discuss this and make a final decision so that we can start implementing.

lzehl avatar Oct 09 '25 09:10 lzehl

I can live with both Option B (as modified in @lzehl's previous comment) and Option D, but I'd like to push the discussion a little further.

We could call Option B "contribution-type-first" and Option D "legal-entity-first", based on the primary grouping. (I am assuming, for Option B, that a given ContributionType should appear only once).

contribution-type-first means it is easy to get a list of all contributors with a given role, but it takes a bit of work to extract all the roles of a given person (or other legal entity). Main disadvantage: people, with their affiliations, have to be entered multiple times, once for each role, which means that it is possible to enter inconsistent affiliations (I can't imagine a situation where the same person has different affiliations for different roles).

legal-entity-first means it is easy to get all the roles of a given person (or other LE), but it takes a bit of work to extract all the people/LEs with a given role, e.g. the author list. Main disadvantage: only one ordering is possible, so we have to assume the ordering either (i) is the same for all roles, or (ii) applies to only one role (i.e., author/developer), and for other roles the order doesn't matter.

I prefer "legal-entity-first" because (i) that is how roles are usually presented in journal articles these days, as far as I can see; (ii) it minimises duplication and the risk of inconsistency; (iii) I don't see a need for ordering of any role other than author (or the equivalent for software, models).


On a related but somewhat separate topic, I think we should keep "custodian" as a separate property from "contributor/contribution", and we should not include "custodian" as a contribution type.

A custodian is not a contributor. One way to demonstrate this is that the choice of custodian can change over time, or custodians can change their affiliation; I would contend that we don't care who the previous custodians were, nor what their previous affiliations were, we just want to know "who do I contact now for information about this research product". This is completely different to actual contributors, for which we want to know everyone who ever contributed to the RP, and what was their affiliation at the time of the contribution.

apdavison avatar Oct 09 '25 15:10 apdavison

@apdavison I'm happy to discuss this further because we really need to find a good solution here that actually improves the model (in structure and flexibility) :)

For me a custodian is the most important of the contributors, because the custodian is taking on the responsibility of maintaining the dataset, answering any questions and is responding to issues (although termed Data Manager in Data Cite, the definitions match perfectly in my opinion and I would regard them as synoyms). They're affiliation is tied to the legal entity where the data were produced/collected (they represent the classical corresponding author or main contact in journal papers). Short: I really don't want to single out one role as being treated differently from the others because it gives ground for more exceptions in the future.

To avoid the problem you detected with Option B of having to state a Contributor multiple times with all affiliation (which I agree is error prone) we can just make Contributors not embedded but linked. Then you can reuse a Person with the same Affiliation (Contributor) multiple times. Danger there: unwanted changes by a curator who does not realize the extend of connections a Contributor then has. Another solution is then actually Option A (which is the one we originally came up with with Tom). However Option A is error prone regarding the actual stated ordering (e.g. order 3 for authors, when there are only two authors). So both cases (modified Option B and A) would need some checks/security measures from the applied system.

Experience from the past: while order of different roles might not matter to us (I'm actually rather for just alphabetical ordering with proper contributing roles defined) it does matter to others and we don't know for what role. And which role should or should not be ordered changes even between research products. Going with D (including contributor as role) would for me only work if we decide there is no ordering anymore.

Maybe we need to see if there is even another option we could go. Any ideas? I need to think a bit about this.

lzehl avatar Oct 11 '25 08:10 lzehl

If you feel that allowing ordering for any role is important, then let's go with Option B.

I would argue for keeping Contributors embedded, because I think the risk of accidental, unwanted changes is higher if we link them (the danger you pointed out). Specifically, it would be normal when creating a RP that a single Contributor instance would be used for a given person in both "author" and "custodian" roles. If the custodian then moves to a different institution but keeps the custodian role, then it would be natural to update the linked instance, not realising that this also changes the affiliation for the "author" role, which should not change.

This comes back to my argument for having "custodian" as a separate property. To repeat myself, "custodian" is a contribution not like the others, since for all other roles we care about the affiliation at the time of the contribution, while for custodian the "contribution" is ongoing, not limited in time, and we care about the affiliation now, at the time we wish to contact them.

So I would modify B further, as

RP/RPV has custodian (1-N) linked category legal person (with required ContactInformation)
RP/RPV has contribution (1-N) embedded type Contribution
Contribution has contributor (1-N) embedded type Contributor (ordered list!!!)
Contribution has role (1) linked type ContributionType
Contributor has contributor (1) linked category: legal person (Person|Organization|Consortium)
Contributor has affiliation (0-N) linked type: Organization (ordered list!!!)

apdavison avatar Oct 13 '25 08:10 apdavison

In discussion with @apdavison we think that this option (B_mod) would work:

RP/RPV has contribution (1-N) embedded type Contribution
Contribution has contributor (1-N) embedded type Contributor (ordered list!!!)
Contribution has role (1) linked type ContributionType
Contributor has contributor (1) linked category: legal person (Person|Organization|Consortium)
Contributor has affiliation (0-N) linked type: Organization (ordered list!!!)

Potential mismatches in stated affiliations for persons in their different role could be automatically checked by a system (another point for our graph validation list).

However another option (E) would be to decouple affiliation information on contributors from the ordering of contributors according to their role. This could look like this:

RP/RPV has contribution (1-N) embedded type Contribution
RP/RPV contributorAffiliation (1-N) embedded type Affiliation
Contribution has contributor (1-N) linked type legal person (Person|Organization|Consortium)
Contribution has role (1) linked type ContributionType
Affiliation affiliation (0-N) linked type: Organization (ordered list!!!)
Affiliation contributor (1) linked type: Person

It is semantically not quite so intuitive has the B_mod option, but it would reduce the curation check to only match the linked persons through contributions to the the linked persons through affiliations (repetition wise it is more elegant).

lzehl avatar Nov 06 '25 07:11 lzehl

Trying to optimize the semantics but keeping the idea from the last option, here the layout for E_mod1:

RPV
|__contribution
|  |__Contribution (embedded type) (1-N)
|     |__contributor -> legal person [Person|Organization|Consortium] (1-N)
|     |__type -> ContributionType (1)
|__contributorAffiliation
   |__Affiliation (embedded type) (0-N)
      |__person -> Person (1)
      |__organization -> Organization (1-N)

And E_mod2:

RPV
|__contributorList
   |__ContributorList (embedded type) (1)
      |__contribution
      |  |__Contribution (embedded type) (1-N)
      |     |__contributor -> legal person [Person|Organization|Consortium] (1-N)
      |     |__type -> ContributionType (1)
      |__affiliation
         |__Affiliation (embedded type) (0-N)
            |__person -> Person (1)
            |__organization -> Organization (1-N)

lzehl avatar Nov 06 '25 07:11 lzehl

@apdavison @openMetadataInitiative/openminds-developers which E option do you prefer?

lzehl avatar Nov 07 '25 10:11 lzehl

E_mod1

apdavison avatar Nov 07 '25 14:11 apdavison

I also prefer E_mod1.

The schema name "ContributorList" isn't very descriptive of its actual content. Either way, I don't really see any reason to have this additional layer. In both options, mismatches between the persons linked as contributors and linked as person for affiliation can happen.

UlrikeS91 avatar Nov 07 '25 14:11 UlrikeS91

@olinux feedback: E_mod1

additional comment for the record from @olinux

  • most elegant solution would be to link Affiliation and Contribution under the same property "contribution" to avoid the additional "contributorAffiliation" property; it would though not be very intuitive for users. so I would also prefer E_mod1

lzehl avatar Nov 07 '25 15:11 lzehl

@Raphael-Gazzotti could you update #561 to the following ?

  • E_mod1 in RP & RPV; make contribution required
  • remove custodian, otherContribution in RP & RPV
  • remove creators (author, developer, etc) from specific products
  • keep coordinator on Project; don't add E_mod1

lzehl avatar Nov 12 '25 08:11 lzehl