framework icon indicating copy to clipboard operation
framework copied to clipboard

fix: zenodo adapter for new file paths

Open coroa opened this issue 7 months ago • 7 comments

Fixes #1682 .

Overview

  • [x] Update zenodo cassettes
  • [x] Fix zenodo adapter
  • [ ] Write tests not working yet

Description

Hi there,

as already investigated in #1682. The zenodo api has been updated and no longer exhibits the old bucket structure where all the files that have been uploaded are accessable at urls like

https://zenodo.org/api/files/<bucket_id>/<file_key>

but rather take a more intuitive form:

https://zenodo.org/api/record/<record_id>/files/<file_key>/content

where file_key refers to the name of the file, like capitals.csv. Without the /content suffix one accesses instead a short metadata json, with information like the mimetype, size and last modified dates and further details.

That suffixed form unfortunately does not work naturally with the current structure of frictionless-py which often tries to infer the ressource or more generally source type by the path ending. I investigated shortly whether adding a new scheme to the zenodo plugin so that the application would support new links like zenodo://<record_id>/<file_key>, but this proved to be relatively difficult and also did not help with building a datapackage from a zenodo entry which does have an explicit datapackage.json descriptor (since the source resolution does not reuse the scheme plugins).

The easier solution taken here is to use instead the html endpoint for retrieving the files, which have the following url structure, which worked naturally:

https://zenodo.org/record/<record_id>/files/<file_key>

This is implemented and working.

I then noticed that tests had been completely disabled for the last two years :(. After removing the cassettes to regenerate them, all the read tests are now working fine again, but the write tests need more work. Writing seems to work fine, but the so-called deposition ids that are being generated (and asserted against) are user dependent and therefore with my personal token, they are failing all the tests.

I am reluctant to update all of them to my deposition ids now, since that would tie the testing to my personal token. That does seem to be the wrong solution!

Could someone advise how they intended those tests to work? @roll ? @shashigharti ?

Thank you

coroa avatar Jun 01 '25 11:06 coroa