kedro-viz Refactor experiment tracking and metadata panel backend

Refactor experiment tracking and metadata panel backend

Open antonymilne opened this issue 2 years ago • 0 comments

Noting down ideas from #964 here; will explain better and break into tickets in due course...

Write lots of tests. Improve the way we do GraphQL tests in general. Various ideas here:

https://strawberry.rocks/docs/operations/testing; https://www.ontestautomation.com/writing-tests-for-graphql-apis-in-python-using-requests/; https://github.com/graphql-python/gql/tree/master/tests/starwars; https://github.com/strawberry-graphql/strawberry/blob/main/CHANGELOG.md#01060---2022-04-14
e2e query tests and unit tests:
- router is very small and doesn't need tests
- types doesn't need testing
- schema methods (query, mutation, subscription) largely delegate to data_access_manager.get... (already covered by unit tests) and then call format functions on the results if required. Tests are covered by e2e-style query tests that we already have
- serializers should be unit tested - currently missing

We should also improve/create tests for the models layer:

flowchart: those for metadata side panel don't seem quite right but maybe others too. e.g. should we reuse the "real" datasets from graphql conftest.py
experiment_tracking: we should cover the bit marked with # pragma: no cover

For adding plots to experiment tracking:

~Add query by group as TrackingDatasetGroup and produce existing behaviour for metrics and JSON.~ done in #978
Try to align TrackingDatasetModel and TrackingDataset. Consider model for run which would simplify (maybe even remove) format_run_tracking_data. How to query by run_id correctly?
Would what it take to get rid of self.runs[run_id] = {} and just not return anything when that version of a dataset doesn't exist?
Might be worth doing a new model for each TrackingDatasetRun in strawberry but not a dataclass model. Maybe do this as GraphQL interface with different implementations, serializers for plots, etc.

Important other refactoring:

Reuse DataNode and DataNoteMetadata models. There's too much duplication between these and tracking datasets.
~Move flowchart import of optional dependencies to same scheme used in experiment tracking~ done in #984
Better system for check_db_session, e.g. decorator argument that returns empty iterable (could be done automatically from type hint)? Null class?
Consider whether is_tracking_dataset should use isinstance instead, but be careful with imports
Think about serialisers. Is format_runs needed? Should formatting go into constructor or class method? Are they needed at all?
Consider structure of GraphQL models and response. e.g. why isn't TrackedDataSets a field in Run?

Jul 19 '22 16:07 antonymilne