OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Support Metric Entity

Open HuanjieGuo opened this issue 1 year ago • 8 comments

Is your feature request related to a problem? Please describe. Our business logic requires us to search Metric, and then we can find which columns this metric depends on.

Describe the solution you'd like

  • Support Metric entity
  • Connect Metric and Column
  • Need the SQL that generates this metric, this information can be put in the Metric entity as a field or in the Json of the lineage

Describe alternatives you've considered

Additional context

HuanjieGuo avatar Apr 02 '24 01:04 HuanjieGuo

on our end we also see a need for this entity. we have different kinds of Metrics (Rule - repetible filter/definition that is used in other Metrics (kpis, exposures, breakdowns), KPIs, Exposures - used for defining an exposure point in AB testing and Breakdowns - dimensions definitions). More or less it's quite simple. Metric on our end can point to another Metric (Rule -> KPI/Exposure/Breakdown), while itself can be composed of 1 or many tables. Metadata wise I think it's quite similar as a table, ID, Name, Owner etc.

Also we'd like to use it in lineage to create something that would look like; Table -> Metric -> Pipeline -> Table -> Dashboard (or in some cases allow Metric -> Metric as well)

tomaspe avatar Jul 17 '24 06:07 tomaspe

@tomaspe how are you capturing the KPI today? do you need to track it against the metric reported value

  1. Metric can have sql or other formula to calculate it, do we want to capture the sql associated with it?
  2. Metric can have reported values , do we want to provide the values over a timeseries?
  3. What other metadata you want to capture?

harshach avatar Aug 19 '24 18:08 harshach

@harshach, I think of metric as something that can live by itself. However, I also see it live as part of a semantic data model.

For Metrics,

  • recording the SQL/formula is essential.
  • It is also essential to link metrics to glossaries during ingestion, as glossaries can give the proper hierarchical structure within which a metric might reside.
  • Other metadata, such as description and display name.

On a side note, we are missing the "semantic model" as an independent representation. Looker, for example, allows us to define "dimensions" and "measures." however when we sync this data to the current "DashboardDataModel," we don't consistently categorize it in OpenMetadata as dimensions/measures ( this is being done in other tools like DataHub)

There is a rise in "Semantic Modeling Tools" ( AtScale, Cube.js, DBT Semantic Layer) that are separate from "Dashboard" requirements, and I would like to ingest the semantic models as a separate entity in itself, which provides dimensions, measures, explores, and views ( or similar concepts ). Could we also externalize "DashboardDataModels" from within the "Dashboard" context?

Once we do that, in my view, what goes for Metrics should also apply to "dimensions"

oscarGomes86 avatar Aug 21 '24 03:08 oscarGomes86

Based on the discussion so far, here is what we are planning to do for Metric entity

  1. Metric will not belong any particular service, so far entities such as data assets namely Table, Dashboard, pipeline comes from their corresponding service such as database service snowflake, pipeline from pipeline service from Airflow, Dashboard can be from looker or power Bi etc..
  2. Metric will be generic data asset , it will be under the umbrella of Governance similar to Glossaries
  3. Metric will not be hierarchical , they will be a list of Metrics

Metric Definition

Schema

{
    "$id": "https://open-metadata.org/schema/entity/data/metric.json",
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Metric",
    "description": "This schema defines te Metric entities.",
    "$comment": "@om-entity-type",
    "type": "object",
    "javaType": "org.openmetadata.schema.entity.data.Metric",
    "javaInterfaces": ["org.openmetadata.schema.EntityInterface"],
    "definitions": {
      "termReference": {
        "type": "object",
        "properties": {
          "name": {
            "description": "Name that identifies the source of an external glossary term. Example `HealthCare.gov`.",
            "type": "string"
          },
          "endpoint": {
            "description": "Name that identifies the source of an external glossary term. Example `HealthCare.gov`.",
            "type": "string",
            "format": "uri"
          }
        },
        "additionalProperties": false
      },
      "status": {
        "type": "string",
        "enum": ["Draft", "Approved", "Deprecated", "Rejected"]
      }
    },
    "properties": {
      "id": {
        "description": "Unique identifier of a metric  instance.",
        "$ref": "../../type/basic.json#/definitions/uuid"
      },
      "name": {
        "description": "Preferred name for the metric.",
        "$ref": "../../type/basic.json#/definitions/entityName"
      },
      "displayName": {
        "description": "Display Name that identifies this Metric.",
        "type": "string"
      },
      "description": {
        "description": "Description of the Metric.",
        "$ref": "../../type/basic.json#/definitions/markdown"
      },
      "fullyQualifiedName": {
        "description": "A unique name that identifies a Metric. In case of Metric this will be name as there is no hierarchy.",
        "$ref": "../../type/basic.json#/definitions/fullyQualifiedEntityName"
      },
      "formula": {
        "description": "Formula to calculate this metric. Supports Latex.",
        "type": "string"
      },
      "sql": {
        "description": "An optional sql query to define this metric.",
        "type": "string"
      },
      "version": {
        "description": "Metadata version of the entity.",
        "$ref": "../../type/entityHistory.json#/definitions/entityVersion"
      },
      "updatedAt": {
        "description": "Last update time corresponding to the new version of the entity in Unix epoch time milliseconds.",
        "$ref": "../../type/basic.json#/definitions/timestamp"
      },
      "updatedBy": {
        "description": "User who made the update.",
        "type": "string"
      },
      "href": {
        "description": "Link to the resource corresponding to this entity.",
        "$ref": "../../type/basic.json#/definitions/href"
      },
      "owners": {
        "description": "Owners of this glossary term.",
        "$ref": "../../type/entityReferenceList.json"
      },
      "usageCount": {
        "description": "Count of how many times this and it's children glossary terms are used as labels.",
        "type": "integer"
      },
      "tags": {
        "description": "Tags associated with this glossary term. These tags captures relationship of a glossary term with a tag automatically. As an example a glossary term 'User.PhoneNumber' might have an associated tag 'PII.Sensitive'. When 'User.Address' is used to label a column in a table, 'PII.Sensitive' label is also applied automatically due to Associated tag relationship.",
        "type": "array",
        "items": {
          "$ref": "../../type/tagLabel.json"
        },
        "default": null
      },
      "changeDescription": {
        "description": "Change that lead to this version of the entity.",
        "$ref": "../../type/entityHistory.json#/definitions/changeDescription"
      },
      "domain" : {
        "description": "Domain this Metric belongs to. ",
        "$ref": "../../type/entityReference.json"
      },
      "dataProducts" : {
        "description": "List of data products this entity is part of.",
        "$ref" : "../../type/entityReferenceList.json"
      },
      "votes" : {
        "description": "Votes on the entity.",
        "$ref": "../../type/votes.json"
      }
    },
    "required": ["id", "name", "formula"],
    "additionalProperties": false
  }
  1. Showing relation or source for a given metric. We will use lineage to provide how a metric is sourced from, this can be a lineage relation from Looker or Snowflake tables or can be DBT itself.

@tomaspe @oscarGomes86 please review above and provide any feedback.

harshach avatar Aug 26 '24 23:08 harshach

I agree with 'Metrics' as a separate "generic data asset" within Governance. The "Metrics" schema looks good.

I do think there is a need for a separate "semantic model" which should be a separate service in itself that syncs from LookML, Cube.js, AtScale, DBT, Superset. This should have "dimensions and measures" synced from these services. This information is essential for end-to-end lineage from a data feed right up to where these dimensions/measures are shown within BI Dashboards/Reports.

oscarGomes86 avatar Aug 27 '24 03:08 oscarGomes86

@oscarGomes86 that makes sense. For the discussion around Metrics is that schema answers what you are looking for. We can look at dimensions and measure as a separate effort

harshach avatar Aug 27 '24 03:08 harshach

@harshach , for me the schema looks good.

oscarGomes86 avatar Aug 27 '24 05:08 oscarGomes86

@harshach Looks good to me too.

tomaspe avatar Aug 27 '24 05:08 tomaspe

@Sachin-chaurasiya here are the mocks image

harshach avatar Sep 02 '24 17:09 harshach

@Sachin-chaurasiya here are the mocks

image

Thanks @harshach

Sachin-chaurasiya avatar Sep 02 '24 17:09 Sachin-chaurasiya