kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

Decide and implement how to include Datasets generated through dataset factories are not included in telemetry counts

Open DimedS opened this issue 1 year ago • 3 comments
trafficstars

Description

Currently, kedro-telemetry does not account for datasets generated through dataset factories. The existing code snippet used for counting datasets is as follows:

project_statistics_properties["number_of_datasets"] = sum(
    1 for c in catalog.list() if not c.startswith("parameters") and not c.startswith("params:")

This method overlooks datasets created via dataset factories. For further discussion, see here.

DimedS avatar Feb 23 '24 10:02 DimedS

Opened a separate issue for packaged Kedro projects https://github.com/kedro-org/kedro-plugins/issues/567

astrojuanlu avatar Feb 24 '24 09:02 astrojuanlu

The one who pick up the ticket should decide and implement which solutions work better. It was discussed that it's unclear how we use this information and it's not urgent until we introduced the opt-out flow.

Two alternatives:

  • Push telemetry to after_pipeline_run
  • Resolves the DataCatalog manually

noklam avatar Mar 25 '24 15:03 noklam

Push telemetry to after_pipeline_run

Isn't it enough to do it at after_catalog_created?

astrojuanlu avatar Mar 25 '24 15:03 astrojuanlu