Changing file metadata doesn't force a major version update, which is required to cause update of file DOIs
If files are added or deleted from a version, we detect it and force the user to do a major /whole number version update. However, if file metadata, such as the name/path/tags are changed, the user can still select a minor version update. One consequence of this when File PIDs are enabled is that, since file PIDs are not updated for minor versions, the file PIDs don't get updated in that case. (I'm not sure if there are other differences.)
As this is pretty minor, and checking for all filemetadata changes could be expensive (unless we track changes with a last modified date or something), I don't currently have any plan to fix it - adding the issue so we're aware/others can decide whether to prioritize.
- What did you expect to happen? Editing file metadata could force a major version update (since it could be a file name change or path which looks like a data change), but alternately, changing publication to update file DOIs for minor version changes (if changes were made) could make sense as well (the file contents can't change so editing file metadata is still a metadata change)- as long as the PIDs are updated, major or minor version change doesn't matter so much.
Which version of Dataverse are you using? 6.2/develop
Any related open or closed issues to this bug report?
Screenshots:
No matter the issue, screenshots are always welcome.
To add a screenshot, please use one of the following formats and/or methods described here:
- https://help.github.com/en/articles/file-attachments-on-issues-and-pull-requests
For the discussion how to mitigate this problem, this paper might give some indications where to go: https://doi.org/10.5334/dsj-2021-012
Note that we are specifically discussing changes in the dataset, not the metadata records in a catalogue that describe an individual dataset. Updating the metadata record does not create a new version in our model, it only changes the catalogue entry. Sometimes the metadata record of a dataset can be changed due to the correction of the metadata, metadata elements added, changing the location of the service endpoints or any other reason. If these changes do not change the bitstream of a dataset manifestation, a change in the metadata record does not constitute a new version.
I suppose the same would be true for file metadata.
Just so I'm clear, if a file description gets updated, that description should be sent to DataCite. However, the file shouldn't get a new PID, right?
@qqmyers Gustavo would like more clarification: Can you clarify what you mean by updating the file DOIs? Does this mean updating the metadata that gets sent to DataCite for those files?
Related to this, when a dataset's metadata is changed and a minor version of the dataset is published, those metadata changes do get sent to DataCite, right?
Yes - I mean the file DOI metadata being updated. I think we probably just skipped updating file PID metadata at DataCite for minor versions because "files don't change" but their names/paths, etc. can. So to assure that DataCite is up to date for files we either need to update the ones that have metadata changes in minor versions, or force a major version change if you update file metadata. The effect of what currently happens is that DataCite would be advertising the old file name/path in that file's metadata, any dataset-related info wouldn't update, etc. - I haven't looked in a while to see what's in the file metadata that is picked up from the dataset metadata, but things like the list of creators, funder, etc. show up along with the version.
Dataset metadata does get updated at DataCite at every minor version. It does NOT get updated every time you edit a draft version so DataCite only knows the metadata you entered when creating the dataset until you publish. (Which is fairly minor since the DataCite metadata isn't public until you publish.)
File metadata and PIDs are still implemented very weakly in Dataverse and a lot of improvements need to made, before they become FAIR. Therefore, we disabled PIDs for files until the improvements are implemented. So this issue should be solved not by adjusting versioning, but by more extensive changes.
As requested per https://groups.google.com/g/dataverse-community/c/S97DUnyc8Jw: I still am very much of the same opinion to follow https://doi.org/10.5334/dsj-2021-012 However, I did not conduct a new literature research if these recommendations may have changed.
I agree with @vaidasmo and @poikilotherm that we in addressing this issue should investigate how possible solutions align with best-practice recommendations.
For versioning, see the resource mentioned by @poikilotherm above (https://doi.org/10.5334/dsj-2021-012) and also a more recent discussion/approach using Archival Information Packages (AIP) Versioning and Delta AIPs with Oxford Common File Layout (OCFL)
As for file-level metadata and PIDs, see a recent discussion in our Google group.
2025-07-28
- Hi @vaidasmo, could you elaborate on the more extensive changes you'd like to see that you suggested in your comment: https://github.com/IQSS/dataverse/issues/10639#issuecomment-3060845682? @sbarbosadataverse