kedro-viz
kedro-viz copied to clipboard
Refactor experiment tracking and metadata panel backend
Noting down ideas from #964 here; will explain better and break into tickets in due course...
Write lots of tests. Improve the way we do GraphQL tests in general. Various ideas here:
- https://strawberry.rocks/docs/operations/testing; https://www.ontestautomation.com/writing-tests-for-graphql-apis-in-python-using-requests/; https://github.com/graphql-python/gql/tree/master/tests/starwars; https://github.com/strawberry-graphql/strawberry/blob/main/CHANGELOG.md#01060---2022-04-14
- e2e query tests and unit tests:
-
router
is very small and doesn't need tests -
types
doesn't need testing -
schema
methods (query, mutation, subscription) largely delegate todata_access_manager.get...
(already covered by unit tests) and then callformat
functions on the results if required. Tests are covered by e2e-style query tests that we already have -
serializers
should be unit tested - currently missing
-
We should also improve/create tests for the models layer:
-
flowchart
: those for metadata side panel don't seem quite right but maybe others too. e.g. should we reuse the "real" datasets from graphql conftest.py -
experiment_tracking
: we should cover the bit marked with# pragma: no cover
For adding plots to experiment tracking:
- ~Add query by group as
TrackingDatasetGroup
and produce existing behaviour for metrics and JSON.~ done in #978 - Try to align
TrackingDatasetModel
andTrackingDataset
. Consider model for run which would simplify (maybe even remove)format_run_tracking_data
. How to query byrun_id
correctly? - Would what it take to get rid of
self.runs[run_id] = {}
and just not return anything when that version of a dataset doesn't exist? - Might be worth doing a new model for each
TrackingDatasetRun
in strawberry but not a dataclass model. Maybe do this as GraphQL interface with different implementations, serializers for plots, etc.
Important other refactoring:
- Reuse
DataNode
andDataNoteMetadata
models. There's too much duplication between these and tracking datasets. - ~Move flowchart import of optional dependencies to same scheme used in experiment tracking~ done in #984
- Better system for
check_db_session
, e.g. decorator argument that returns empty iterable (could be done automatically from type hint)? Null class? - Consider whether
is_tracking_dataset
should useisinstance
instead, but be careful with imports - Think about serialisers. Is
format_runs
needed? Should formatting go into constructor or class method? Are they needed at all? - Consider structure of GraphQL models and response. e.g. why isn't
TrackedDataSets
a field inRun
?