Support Metric Entity
Is your feature request related to a problem? Please describe. Our business logic requires us to search Metric, and then we can find which columns this metric depends on.
Describe the solution you'd like
- Support Metric entity
- Connect Metric and Column
- Need the SQL that generates this metric, this information can be put in the Metric entity as a field or in the Json of the lineage
Describe alternatives you've considered
Additional context
on our end we also see a need for this entity. we have different kinds of Metrics (Rule - repetible filter/definition that is used in other Metrics (kpis, exposures, breakdowns), KPIs, Exposures - used for defining an exposure point in AB testing and Breakdowns - dimensions definitions). More or less it's quite simple. Metric on our end can point to another Metric (Rule -> KPI/Exposure/Breakdown), while itself can be composed of 1 or many tables. Metadata wise I think it's quite similar as a table, ID, Name, Owner etc.
Also we'd like to use it in lineage to create something that would look like; Table -> Metric -> Pipeline -> Table -> Dashboard (or in some cases allow Metric -> Metric as well)
@tomaspe how are you capturing the KPI today? do you need to track it against the metric reported value
- Metric can have sql or other formula to calculate it, do we want to capture the sql associated with it?
- Metric can have reported values , do we want to provide the values over a timeseries?
- What other metadata you want to capture?
@harshach, I think of metric as something that can live by itself. However, I also see it live as part of a semantic data model.
For Metrics,
- recording the SQL/formula is essential.
- It is also essential to link metrics to glossaries during ingestion, as glossaries can give the proper hierarchical structure within which a metric might reside.
- Other metadata, such as description and display name.
On a side note, we are missing the "semantic model" as an independent representation. Looker, for example, allows us to define "dimensions" and "measures." however when we sync this data to the current "DashboardDataModel," we don't consistently categorize it in OpenMetadata as dimensions/measures ( this is being done in other tools like DataHub)
There is a rise in "Semantic Modeling Tools" ( AtScale, Cube.js, DBT Semantic Layer) that are separate from "Dashboard" requirements, and I would like to ingest the semantic models as a separate entity in itself, which provides dimensions, measures, explores, and views ( or similar concepts ). Could we also externalize "DashboardDataModels" from within the "Dashboard" context?
Once we do that, in my view, what goes for Metrics should also apply to "dimensions"
Based on the discussion so far, here is what we are planning to do for Metric entity
- Metric will not belong any particular service, so far entities such as data assets namely Table, Dashboard, pipeline comes from their corresponding service such as database service snowflake, pipeline from pipeline service from Airflow, Dashboard can be from looker or power Bi etc..
- Metric will be generic data asset , it will be under the umbrella of Governance similar to Glossaries
- Metric will not be hierarchical , they will be a list of Metrics
Metric Definition
Schema
{
"$id": "https://open-metadata.org/schema/entity/data/metric.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Metric",
"description": "This schema defines te Metric entities.",
"$comment": "@om-entity-type",
"type": "object",
"javaType": "org.openmetadata.schema.entity.data.Metric",
"javaInterfaces": ["org.openmetadata.schema.EntityInterface"],
"definitions": {
"termReference": {
"type": "object",
"properties": {
"name": {
"description": "Name that identifies the source of an external glossary term. Example `HealthCare.gov`.",
"type": "string"
},
"endpoint": {
"description": "Name that identifies the source of an external glossary term. Example `HealthCare.gov`.",
"type": "string",
"format": "uri"
}
},
"additionalProperties": false
},
"status": {
"type": "string",
"enum": ["Draft", "Approved", "Deprecated", "Rejected"]
}
},
"properties": {
"id": {
"description": "Unique identifier of a metric instance.",
"$ref": "../../type/basic.json#/definitions/uuid"
},
"name": {
"description": "Preferred name for the metric.",
"$ref": "../../type/basic.json#/definitions/entityName"
},
"displayName": {
"description": "Display Name that identifies this Metric.",
"type": "string"
},
"description": {
"description": "Description of the Metric.",
"$ref": "../../type/basic.json#/definitions/markdown"
},
"fullyQualifiedName": {
"description": "A unique name that identifies a Metric. In case of Metric this will be name as there is no hierarchy.",
"$ref": "../../type/basic.json#/definitions/fullyQualifiedEntityName"
},
"formula": {
"description": "Formula to calculate this metric. Supports Latex.",
"type": "string"
},
"sql": {
"description": "An optional sql query to define this metric.",
"type": "string"
},
"version": {
"description": "Metadata version of the entity.",
"$ref": "../../type/entityHistory.json#/definitions/entityVersion"
},
"updatedAt": {
"description": "Last update time corresponding to the new version of the entity in Unix epoch time milliseconds.",
"$ref": "../../type/basic.json#/definitions/timestamp"
},
"updatedBy": {
"description": "User who made the update.",
"type": "string"
},
"href": {
"description": "Link to the resource corresponding to this entity.",
"$ref": "../../type/basic.json#/definitions/href"
},
"owners": {
"description": "Owners of this glossary term.",
"$ref": "../../type/entityReferenceList.json"
},
"usageCount": {
"description": "Count of how many times this and it's children glossary terms are used as labels.",
"type": "integer"
},
"tags": {
"description": "Tags associated with this glossary term. These tags captures relationship of a glossary term with a tag automatically. As an example a glossary term 'User.PhoneNumber' might have an associated tag 'PII.Sensitive'. When 'User.Address' is used to label a column in a table, 'PII.Sensitive' label is also applied automatically due to Associated tag relationship.",
"type": "array",
"items": {
"$ref": "../../type/tagLabel.json"
},
"default": null
},
"changeDescription": {
"description": "Change that lead to this version of the entity.",
"$ref": "../../type/entityHistory.json#/definitions/changeDescription"
},
"domain" : {
"description": "Domain this Metric belongs to. ",
"$ref": "../../type/entityReference.json"
},
"dataProducts" : {
"description": "List of data products this entity is part of.",
"$ref" : "../../type/entityReferenceList.json"
},
"votes" : {
"description": "Votes on the entity.",
"$ref": "../../type/votes.json"
}
},
"required": ["id", "name", "formula"],
"additionalProperties": false
}
- Showing relation or source for a given metric. We will use lineage to provide how a metric is sourced from, this can be a lineage relation from Looker or Snowflake tables or can be DBT itself.
@tomaspe @oscarGomes86 please review above and provide any feedback.
I agree with 'Metrics' as a separate "generic data asset" within Governance. The "Metrics" schema looks good.
I do think there is a need for a separate "semantic model" which should be a separate service in itself that syncs from LookML, Cube.js, AtScale, DBT, Superset. This should have "dimensions and measures" synced from these services. This information is essential for end-to-end lineage from a data feed right up to where these dimensions/measures are shown within BI Dashboards/Reports.
@oscarGomes86 that makes sense. For the discussion around Metrics is that schema answers what you are looking for. We can look at dimensions and measure as a separate effort
@harshach , for me the schema looks good.
@harshach Looks good to me too.
@Sachin-chaurasiya here are the mocks
@Sachin-chaurasiya here are the mocks
Thanks @harshach