renku-python icon indicating copy to clipboard operation
renku-python copied to clipboard

Metadata Refactoring

Open mohammad-alisafaee opened this issue 2 years ago • 0 comments

We have different database indexes for datasets, plans, and activities which aren't consistent. For example, plans is a map from id to all Plan objects (includes removed ones) but datasets is a map from name to active Dataset objects (excludes removed datasets). There are also differences in the difference Gateway APIs for each of these classes.

We should have a consistent set of indexes for each of these classes (whenever it makes sense):

  • datasets, plans, and activities should be maps from id to objects and include removed objects as well
  • datasets-by-name and plans-by-name are maps from name to active objects (i.e. non-removed)
  • Maybe having datasets-removed, plans-removed, and activities-removed to map id to all deleted objects (not just the tail object). We can remove them from the first indexes in this case.
  • datasets-tags includes tags only for active datasets. We should include removed datasets as well (needs to be discussed).
  • All Gateway should have consistent APIs
  • Discuss to unify DatasetGateway and DatasetsProvenance
  • .renku/metadata.yml can be deleted since we dropped support for <v1.0.0

Additional context

  • If we have these changes before deploying v10 metadata, we can skip v10 and deploy v11 directly.
  • We can use BTrees.check.check to validate indexes (in case users modified them). This function won't work with subclasses out-of-the-box. So, we either have to delete RenkuOOBTree and inherit directly from a BTree or make it work with subclasses.

Notes

  • Plan and Dataset objects can have a derivation chain; when removing a plan/dataset, we set the tail object as removed and don't modify others. We should consider this when filtering removed objects.

mohammad-alisafaee avatar Feb 22 '23 11:02 mohammad-alisafaee