renku-python
renku-python copied to clipboard
Metadata Refactoring
We have different database indexes for datasets, plans, and activities which aren't consistent. For example, plans
is a map from id
to all Plan objects (includes removed ones) but datasets
is a map from name
to active Dataset objects (excludes removed datasets). There are also differences in the difference Gateway APIs for each of these classes.
We should have a consistent set of indexes for each of these classes (whenever it makes sense):
-
datasets
,plans
, andactivities
should be maps fromid
to objects and include removed objects as well -
datasets-by-name
andplans-by-name
are maps fromname
to active objects (i.e. non-removed) - Maybe having
datasets-removed
,plans-removed
, andactivities-removed
to mapid
to all deleted objects (not just the tail object). We can remove them from the first indexes in this case. -
datasets-tags
includes tags only for active datasets. We should include removed datasets as well (needs to be discussed). - All Gateway should have consistent APIs
- Discuss to unify
DatasetGateway
andDatasetsProvenance
-
.renku/metadata.yml
can be deleted since we dropped support for<v1.0.0
Additional context
- If we have these changes before deploying
v10
metadata, we can skipv10
and deployv11
directly. - We can use
BTrees.check.check
to validate indexes (in case users modified them). This function won't work with subclasses out-of-the-box. So, we either have to deleteRenkuOOBTree
and inherit directly from aBTree
or make it work with subclasses.
Notes
- Plan and Dataset objects can have a derivation chain; when removing a plan/dataset, we set the tail object as removed and don't modify others. We should consider this when filtering removed objects.