thoth icon indicating copy to clipboard operation
thoth copied to clipboard

Automatically create Location entry in Thoth when a Dissemination workflow succeeds

Open rhigman opened this issue 1 year ago • 3 comments

On successful dissemination, add a new Location entry to the relevant Publication in Thoth, recording the URL(s) of the newly-created directory entry and/or copy of the content.

To be added to the existing Internet Archive/Figshare workflows (Crossref is not relevant as DOIs are already present in Thoth at time of submission). For Internet Archive, the Platform type INTERNET_ARCHIVE is now available; for institutional repositories such as Figshare, OTHER will need to be used.

There would then be potential for replacing the current logic for checking whether or not a Work already exists in the target platform, instead looking at whether a Location with the relevant Platform exists.

This can then be extended to new dissemination platforms when they are implemented. Ease of implementation may depend on individual platforms' workflows; Internet Archive returns the relevant URL to the dissemination script immediately on creation, but e.g. FTP-based workflows are unlikely to be as neat.

rhigman avatar Feb 05 '24 13:02 rhigman

As part of this work, add something like PUBLISHER_WEBSITE to the set of location platform types.

rhigman avatar Feb 06 '24 16:02 rhigman

As part of this work, add something like PUBLISHER_WEBSITE to the set of location platform types.

Tracked separately now under #561

ja573 avatar Feb 15 '24 12:02 ja573

  • [x] For each platform, determine the appropriate landingPage and fullTextUrl to record in Thoth
    • [X] Internet Archive: landingPage = archive.org/details/[workId], fullTextUrl = archive.org/details/[workId]/[filename].pdf
    • [x] Figshare: note figshare.com (API) vs repository.lboro.ac.uk (UI) versions of same links, Handles, etc; treat as a "Figshare" upload or a "Loughborough repository" upload (repositories may migrate platforms)?
    • [x] CUL: tbd
    • [x] Zenodo (or do under #542)
    • [ ] OAPEN: works don't acquire these URLs until some hours/days after dissemination. Currently handled manually. Any alternative? Split out as separate task?
    • [X] (Crossref: not relevant here)
  • [x] Extend disseminator to (retrieve and) pass back landingPage and fullTextUrl when they have been assigned via successful archiving
    • [x] this will need to be on a per-publication basis as we sometimes disseminate more than one format, so publicationId will also need to be passed back, or at least publicationType
  • [x] Add script which takes publicationId, locationPlatform, landingPage and fullTextUrl and writes location to Thoth
    • [x] locationPlatform could be supplied directly or derived from inputs to disseminator
    • [x] publicationId could be passed back directly as above, or obtained from Thoth via e.g. workId + publicationType
  • [x] Extend GitHub Actions to take output from each disseminator run and pass it to new script
    • [x] for dissemination of multiple formats, should the script be called multiple times, or should it handle multiple locations itself?
  • [x] Determine whether any new locationPlatforms need to be added to Thoth
    • [x] e.g. FIGSHARE - or as above, should it be e.g. LBORO_REPO?
    • [ ] Any way of marking/"locking" these locations as created by Thoth Dissemination Service/part of Thoth Archiving Network?
    • [ ] Is it still appropriate to permit only one location per locationPlatform for all of these? (e.g. users might independently upload copies to additional Figshare repositories, etc - not necessarily sensible but shouldn't go unrecorded)
  • [x] Catchup run: ensure that works disseminated prior to implementation of this feature all have appropriate locations created
    • [ ] Could a similar mechanism be used (on a regular, automatic basis) to handle OAPEN, as above?
  • [x] Add an appropriate set of Thoth credentials as repository secret (or organisation secret - would require permissions)

rhigman avatar Apr 19 '24 10:04 rhigman