renku-python Metadata Refactoring

Metadata Refactoring

Open mohammad-alisafaee opened this issue 2 years ago • 0 comments

We have different database indexes for datasets, plans, and activities which aren't consistent. For example, plans is a map from id to all Plan objects (includes removed ones) but datasets is a map from name to active Dataset objects (excludes removed datasets). There are also differences in the difference Gateway APIs for each of these classes.

We should have a consistent set of indexes for each of these classes (whenever it makes sense):

datasets, plans, and activities should be maps from id to objects and include removed objects as well
datasets-by-name and plans-by-name are maps from name to active objects (i.e. non-removed)
Maybe having datasets-removed, plans-removed, and activities-removed to map id to all deleted objects (not just the tail object). We can remove them from the first indexes in this case.
datasets-tags includes tags only for active datasets. We should include removed datasets as well (needs to be discussed).
All Gateway should have consistent APIs
Discuss to unify DatasetGateway and DatasetsProvenance
.renku/metadata.yml can be deleted since we dropped support for <v1.0.0

Additional context

If we have these changes before deploying v10 metadata, we can skip v10 and deploy v11 directly.
We can use BTrees.check.check to validate indexes (in case users modified them). This function won't work with subclasses out-of-the-box. So, we either have to delete RenkuOOBTree and inherit directly from a BTree or make it work with subclasses.

Notes

Plan and Dataset objects can have a derivation chain; when removing a plan/dataset, we set the tail object as removed and don't modify others. We should consider this when filtering removed objects.

Feb 22 '23 11:02 mohammad-alisafaee

renku-python renku-python copied to clipboard

Metadata Refactoring

renku-python
renku-python copied to clipboard