kedro-viz icon indicating copy to clipboard operation
kedro-viz copied to clipboard

Refactor experiment tracking and metadata panel backend

Open antonymilne opened this issue 2 years ago • 0 comments

Noting down ideas from #964 here; will explain better and break into tickets in due course...

Write lots of tests. Improve the way we do GraphQL tests in general. Various ideas here:

  • https://strawberry.rocks/docs/operations/testing; https://www.ontestautomation.com/writing-tests-for-graphql-apis-in-python-using-requests/; https://github.com/graphql-python/gql/tree/master/tests/starwars; https://github.com/strawberry-graphql/strawberry/blob/main/CHANGELOG.md#01060---2022-04-14
  • e2e query tests and unit tests:
    • router is very small and doesn't need tests
    • types doesn't need testing
    • schema methods (query, mutation, subscription) largely delegate to data_access_manager.get... (already covered by unit tests) and then call format functions on the results if required. Tests are covered by e2e-style query tests that we already have
    • serializers should be unit tested - currently missing

We should also improve/create tests for the models layer:

  • flowchart: those for metadata side panel don't seem quite right but maybe others too. e.g. should we reuse the "real" datasets from graphql conftest.py
  • experiment_tracking: we should cover the bit marked with # pragma: no cover

For adding plots to experiment tracking:

  • ~Add query by group as TrackingDatasetGroup and produce existing behaviour for metrics and JSON.~ done in #978
  • Try to align TrackingDatasetModel and TrackingDataset. Consider model for run which would simplify (maybe even remove) format_run_tracking_data. How to query by run_id correctly?
  • Would what it take to get rid of self.runs[run_id] = {} and just not return anything when that version of a dataset doesn't exist?
  • Might be worth doing a new model for each TrackingDatasetRun in strawberry but not a dataclass model. Maybe do this as GraphQL interface with different implementations, serializers for plots, etc.

Important other refactoring:

  • Reuse DataNode and DataNoteMetadata models. There's too much duplication between these and tracking datasets.
  • ~Move flowchart import of optional dependencies to same scheme used in experiment tracking~ done in #984
  • Better system for check_db_session, e.g. decorator argument that returns empty iterable (could be done automatically from type hint)? Null class?
  • Consider whether is_tracking_dataset should use isinstance instead, but be careful with imports
  • Think about serialisers. Is format_runs needed? Should formatting go into constructor or class method? Are they needed at all?
  • Consider structure of GraphQL models and response. e.g. why isn't TrackedDataSets a field in Run?

antonymilne avatar Jul 19 '22 16:07 antonymilne