metacatui icon indicating copy to clipboard operation
metacatui copied to clipboard

support structured funding info from EML 2.2.0

Open mbjones opened this issue 5 years ago • 13 comments

Describe the feature you'd like Structured funding information was added to EML 2.2.0: https://eml.ecoinformatics.org/whats-new-in-eml-2-2-0.html#structured-funding-information We should support editing that more detailed information.

  • [x] Display structured funding info
  • [ ] Edit structured funding info

Is your feature request related to a problem? Please describe. Its difficult to machine-parse our current natural language funding information.

Additional context

  • If funding information present in a 2.1.1 or earlier document, we should make an attempt to forward convert that to the new EML 2.2.0 award element structure if possible using common patterns for award identifiers.
  • Funder identifiers might be able to be looked up from the FundRef system by CrossRef, or maybe through ROR, or others.

mbjones avatar May 22 '20 00:05 mbjones

We run into an issue now with editing EML 2.2.0 documents when we use the editor. The awards field will be empty if we are using the new structured funding info and you can't save until you add something into that field because the field is required. So we will need to do an extra step in R to remove the funding information added every time we use the editor.

laijasmine avatar Aug 20 '20 23:08 laijasmine

@amoeba is this a side effect of simply serializing into EML 2.2.0 format? The new award field is optional and so shouldn't be present in any autoconverted document unless specific steps are taken to transform funding field info into the award subfields. Thoughts?

mbjones avatar Aug 21 '20 00:08 mbjones

Sounds like we missed this in the switch to EML 2.2.0: I think we just need to change MetacatUI's logic for checking whether funding information has been set. Currently, dataset/project/funding is probably required but we should really check for dataset/project/funding and/or dataset/project/award, considering having at least one as "having funding".

amoeba avatar Aug 21 '20 01:08 amoeba

After looking, I ran into a question: Do we have a use case for deployments wanting to accept funding or award or is our plan to make the funding portion of the editor automatically migrate funding into an award element on parse and only allow editing of funding information using the award element?

amoeba avatar Aug 21 '20 01:08 amoeba

The way we redefined funding, it becomes more of a natural language description and can coexist with the structured funding info in award. I could see having a paragraph in funding along with one or more award entries. That said, if all funding contains is something like National Science Foundation 198765 than I think it would be better to convert that into the award structure.

mbjones avatar Aug 21 '20 02:08 mbjones

Gotcha. I think allowing editing of the old funding field will probably be useful to someone.

I mocked up two different approaches to allowing simultaneous entry of project/funding and project/award over on https://www.figma.com/file/MztNBDfq7LZ1ZvTLarnftC/Funding-VIew?node-id=0%3A1

One where the autocomplete is kind of a separate thing from the form entry:

Screen Shot 2020-08-20 at 9 19 10 PM

And one where it's integrated into the award number form field:

Screen Shot 2020-08-20 at 9 19 14 PM

I kinda like the second one a bit more as the separation concept is a bit weird and the second one matches better with our current UI.

amoeba avatar Aug 21 '20 05:08 amoeba

I'm adding the "bug" label since this presents issues when we process datasets, as @laijasmine described above. It essentially means that every time we edit a dataset in the editor that we have added structured funding to, we have to go back in R and remove the dummy information we are forced to add.

jeanetteclark avatar Sep 14 '20 21:09 jeanetteclark

Thanks @jeanetteclark for the bump. I think we can put an immediate fix in before we get in full structured funding support by relaxing the validation as I commented in https://github.com/NCEAS/metacatui/issues/1403#issuecomment-677980583. @laurenwalker do you see any issue with popping a quick fix in so @jeanetteclark and @laijasmine 's issue above can get resolved quickly? I could get a fix in today I think.

amoeba avatar Sep 14 '20 21:09 amoeba

Hey @amoeba @laurenwalker just wanted to check in here on the quick fix - this continues to plague our interns.

jeanetteclark avatar Sep 30 '20 21:09 jeanetteclark

Workaround landed in c33fff7a0078c3f080fb2f036ffbe90a2961ed13 (issue #1533). When we add in full support for structured funding information, we'll want to (though we don't strictly need to) revisit the validation logic patch to simplify it.

amoeba avatar Oct 07 '20 00:10 amoeba

@laurenwalker please make sure this is in your release roadmap for ADC for the near future -- it is one of the things that makes a lot of extra work for the datateam on every dataset.

mbjones avatar Mar 18 '22 17:03 mbjones

Until a full award editor is added to the EML editor, we can make things more compatible with EML 2.2.0 by creating an EML award with an award number. The award number can be intelligently parsed from the user input or NSF Award API lookup selection.

From Jeanette:

things might get a little easier if we:

  • serialize into awards
  • parse input more intelligently (if its an NSF award number we should just have the 6 digit award nothing else)
  • make sure the NSF lookup is working and populate the award section with the relevant info if it is

Here is what the data team does in R right now for each dataset to update the funding section. This could be replicated in MetacatUI when someone selects an NSF award from the lookup:


 function(awards, eml_version = "2.2"){

  stopifnot(is.character(awards))
  stopifnot(eml_version %in% c("2.1", "2.1.1", "2.2", "2.2.0"))

  award_nums <- awards

  result <- lapply(award_nums, function(x){
    url <- paste0("https://api.nsf.gov/services/v1/awards.json?id=", x ,"&printFields=coPDPI,pdPIName,title")

    t <- tryCatch(jsonlite::fromJSON(url),
                  error = function(j) {
                    j$message <- paste0("The NSF API is most likely down. Check back later. ", j$message)
                  })

    if ("serviceNotification" %in% names(t$response)) {
      warning(paste(t$response$serviceNotification$notificationMessage, "\n",
                    t$response$serviceNotification$notificationType, "for award", x ,
                    "\n this award will not be included in the project section."), call. = FALSE)
      t <- NULL
    }
    else if (length(t$response$award) == 0){
      warning(paste("Empty result for award", x, "\n this award will not be included in the project section."), call. = FALSE)
      t <- NULL
    }
    else t
  })

  i <- lapply(result, function(x) {!is.null(x)})
  result <- result[unlist(i)]
  award_nums <- award_nums[unlist(i)]

  if (length(award_nums) == 0){
    stop(call. = F,
         "No valid award numbers were found.")
  }

  co_pis <- lapply(result, function(x){
    extract_name(x$response$award$coPDPI)
  })

  co_pis <- unlist(co_pis, recursive = F)
  co_pis <- do.call("rbind", co_pis)
  if (!is.null(co_pis)){
    co_pis$role <- "coPrincipalInvestigator"
  }

  pis <- lapply(result, function(x){
    extract_name(x$response$award$pdPIName)
  })

  pis <- unlist(pis, recursive = F)
  pis <- do.call("rbind", pis) %>%
    dplyr::mutate(role = "principalInvestigator")

  people <- dplyr::bind_rows(co_pis, pis) %>%
    dplyr::distinct()

  p_list <- list()
  for (i in 1:nrow(people)){
    p_list[[i]] <- EML::eml$personnel(individualName = list(givenName = people$firstName[i],
                                                            surName = people$lastName[i]),
                                      role = people$role[i])
  }

  titles <- lapply(result, function(x){
    unlist(x$response$award$title)
  })

  if (eml_version %in% c("2.1", "2.1.1")){
    award_nums <- paste("NSF", award_nums)
    proj <- EML::eml$project(title = titles, personnel = p_list, funding = award_nums)

  }
  else if (eml_version %in% c("2.2", "2.2.0")){
    awards <- list()

    for (i in 1:length(award_nums)){
      awards[[i]] <- list(title = titles[i],
                          funderName = "National Science Foundation",
                          funderIdentifier = "https://doi.org/10.13039/100000001",
                          awardNumber = award_nums[i],
                          awardUrl = paste0("https://www.nsf.gov/awardsearch/showAward?AWD_ID=", award_nums[i]))
    }

    proj <- list(title = titles, personnel = p_list, award = awards)
    return(proj)
  }
}

laurenwalker avatar Mar 18 '22 18:03 laurenwalker

We are using the structured award on WFSI (wfsi-data.org). We added a field on the Metadata Overview page that pulls from a controlled list of "Funding Awards". In some use cases there are multiple awards to apply to the dataset. We have to manually add this metadata instead of using our UI enhancement. In this case, we opted to have one project with multiple awards due to the fact that Metacat does not index subproject (project/relatedProject element in solr.

Singe Award

<project>
  <title>Funding Award RC19-1064: 3D fuel characterization for evaluating physics-based fire behavior, fire effects and smoke models on US U. S. Department of  Defense (DoD) military lands.</title>
  <personnel id="1020212546916141">
    <individualName><givenName>Roger</givenName><surName>Ottmar</surName></individualName>
    <organizationName>USDA FS PNW Research Station</organizationName> 
    <electronicMailAddress>[email protected]</electronicMailAddress>
    <userId directory="https://orcid.org">https://orcid.org/0000-0002-4385-4052</userId>
    <role>Principal Investigator</role>
  </personnel>
  <award>
    <funderName>U. S. Department of  Defense (DoD), Strategic Environmental Research and Development Program (SERDP)</funderName>
    <funderIdentifier>http://dx.doi.org/10.13039/100013316</funderIdentifier>
    <awardNumber>RC19-1064</awardNumber>
    <title>3D fuel characterization for evaluating physics-based fire behavior, fire effects and smoke models on US U. S. Department of  Defense (DoD) military lands.</title>
  </award>
</project>

Multiple Awards This is not optimal as we added something in the abstract that says which award the PIs are connected to.
project/relatedProject may be useful but the top level project would have to be something that connects the two awards.
It is really unclear what a project is vs award.

<project>
    <title>Funding Awards RC20-1346 and RC19-1119 for DoD Wildland Fire Science Initiative (WFSI)</title>
    <personnel id="1396442241546151">
        <individualName>
            <givenName>Andrew</givenName>
            <surName>Hudak</surName>
        </individualName>
        <organizationName>USDA FS RMRS</organizationName>
        <electronicMailAddress>[email protected]</electronicMailAddress>
        <userId directory="https://orcid.org">https://orcid.org/0000-0001-7480-1458</userId>
        <role>Principal Investigator</role>
    </personnel>
    <personnel id="1396442241546150">
        <individualName>
            <givenName>Chad</givenName>
            <surName>Hoffman</surName>
        </individualName>
        <organizationName>Colorado State University</organizationName>
        <electronicMailAddress>[email protected]</electronicMailAddress>
        <userId directory="https://orcid.org">https://orcid.org/0000-0001-8715-937X</userId>
        <role>Principal Investigator</role>
    </personnel>
    <abstract>
        <section><title>RC20-1346 PI</title><para>Andrew Hudak</para></section>
        <section><title>RC19-1119 PI</title><para>Chad Hoffman</para></section>
    </abstract>
    <award>
        <funderName>U. S. Department of Defense (DoD), Strategic Environmental Research and Development Program (SERDP)</funderName>
        <funderIdentifier>http://dx.doi.org/10.13039/100013316</funderIdentifier>
        <awardNumber>RC20-1346</awardNumber>
        <title>Object-based aggregation of fuel structures, physics-based fire behavior and self-organizing smoke plumes for improved fuel, fire, and smoke management on military lands.</title>
    </award>
    <award>
        <funderName>U. S. Department of Defense (DoD), Strategic Environmental Research and Development Program (SERDP)</funderName>
        <funderIdentifier>http://dx.doi.org/10.13039/100013316</funderIdentifier>
        <awardNumber>RC19-1119</awardNumber>
        <title>Characterizing multiscale feedback between forest structure, fire behavior and effects: integrating measurements and mechanistic modeling for improved understanding of patterns and processes.</title>
    </award>
</project>

vchendrix avatar Feb 02 '24 20:02 vchendrix