dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Extend Storage Quotas to individual datasets and user accounts

Open cmbz opened this issue 7 months ago • 1 comments

(edited by @landreev 05-27-2025; question marks indicate issues that may need additional discussion before we have a clear idea of what to implement and how)

Overview of the Feature Request

  • Last year we have added storage quotas that can be enabled on collections. However, that cannot be used to track and control storage use in the datasets that installations like IQSS allow users to create in the root-level collection.
  • We have a local curation request (see https://github.com/IQSS/dataverse.harvard.edu/issues/365) to address this by adding quotas that can be configured on per user account-basis.
  • While it should be easy to force an account to become read-only once a certain amount of data is deposited with it, it seems like it would be very easy to bypass, by giving write access to a dataset to other accounts (?).
  • It may however be possible to address this by being able to set quotas both on a per-user AND per-dataset basis (?)

Extending the currently implemented collection quotas to be configurable on individual datasets should be trivial. The same code that keeps track of collection storage use already does that for datasets as well. The actual quota check methods already work on either collections or datasets. That internal functionality just needs to be extended to the APIs that set the quotas and report storage use.

It should not be difficult to add keeping track of storage use by user accounts either. Provided we simply count the sizes of all the files created by the given authenticateduser. (?)

Any open or closed issues related to this feature request?

  • https://github.com/IQSS/dataverse.harvard.edu/issues/364
  • https://github.com/IQSS/dataverse.harvard.edu/issues/365
  • https://github.com/IQSS/dataverse/issues/7829
  • https://github.com/IQSS/dataverse/issues/11275

cmbz avatar May 27 '25 13:05 cmbz

Now that we have enabled quotas on all top-level collections at HDV, we'll need to prioritize this issue, so that we can set up quotas for the datasets created directly in the top-level, root collection, and the users who own them. This will take some design and dev. effort. In the meantime though, I am going to open a sub-issue for one intermediate, partial solution that can be produced very easily.

landreev avatar Nov 18 '25 15:11 landreev