OpenMetadata
OpenMetadata copied to clipboard
Tableau Connector : Unify Data Models
Is your feature request related to a problem? Please describe.
When ingesting Data Models in tableau, multiple datamodels are displayed for the same data source.
This explodes the number of total data sources, even though unique, and makes discovery and lineage more complicated.
Describe the solution you'd like Today OMD relies on the nodes segment of Tableau metadata to create the data model.
embeddedDatasourcesConnection(first: {first}, offset: {offset} ) {{
nodes {{
id
name
fields {{
id
name
upstreamColumns{{
id
name
remoteType
}}
But perhaps a better way would be to create the data model based on the root data model, since these share the same ID across the models
@harshach I wanted to chime in on this conversation. At my organisation, we're ingesting a large Tableau instance and we've also noticed this behavior where there are multiple versions of the same datasource. What we found, is that a workbook (dashboard) can have it's own embedded datasource, that is a workbook unique version of an upstream datasource that it connects to (usually one that exists on Tableau server). The reason for this, seems to be, that a workbook can connect to a datasource, then change field names, add calculated fields and do various other things to have it's own version of the connected datasource.
Ideally what we would like (and @jsampaiog please jump in if you disagree); is for published datasources to be ingested into OpenMetadata as well as the embedded datasources (would be nice to have a new icon to differentiate the datamodels).
I think it's important to keep both the published and embedded datasources, because that way we can see what transformations have occurred at the workbook level and compare it to the published server model.
Here's a screenshot of what it might look like:
Hi @chillerno1, thanks for chiming in. Indeed your depicted behavior would be the best target scenario! But we also brainstormed internally, and as a matter of fact, in order to avoid complexifying OpenMetaData Data Model, if we were forced to choose between "Published datasources" and "Embedded datasources", we would stick with the first.
Thanks @jsampaiog, I agree with that!
Chiming in too. Pretty much I have a similar scenario to what @jsampaiog described.
A published data source that then it's used in multiple places. We are planning to use this for more scenarios, therefore the amount of data models can simply explode. What @jsampaiog suggested in the original issue seems the way to go:
But perhaps a better way would be to create the data model based on the root data model, since these share the same ID across the models
@pmbrull are you still planning to include this in 1.4.0 release? I see that was removed :(
hi we had to reprioritize certain topics and ran out of time to handle this, so 1.4.1 - 1.5 would be the new ETA.
My 2 cents on the conversation above is to keep things simple. Aiming to keep the Published DataModel
IMO would be the way to go to reduce complexity
@pmbrull thanks for the context on the timelines.
I believe that "Published DataModel" should do the job even in case of "Dashboard" with embedded data models. We just need to be sure that we don't introduce a regression, where data models are totally missing.
Thanks @OnkarVO7 , for the this thread.
We also have similar problem of having duplicate Models rather a combined model for all workbooks down the stream.
Since currently OMD use this query
query { embeddedDatasourcesConnection(filter: {name: "Tech Data Model"}) { nodes{ id name workbook { id name } } totalCount } }
We checked with Tableau team (spent a lot of time with Tableau support team to get information in right way) and they proposed to use below query
query { publishedDatasources(filter: {name: "Tech Data Model"}) { id name hasExtracts downstreamWorkbooks{ id luid name } } }
Hi,
Sorry for commenting in this thread, we are facing the same situation: the sources are duplicated for each workbook (dashboard in OM) that we ingested.
The dashboard datamodel exists only once on Tableau:
If the object exists only once, we can trace lineage with the workbooks, assign the owner once, not make them independent objects.
In addition, we have it separated by different services, each service is a tableau folder, since this allows us to assign owner by folder, perhaps, if in the ingestion the folder (tableau) is ingested as the database service would allow us to maintain that hierarchy that also allows us to filter by folder:
DB Ingestion->Schema->table Tableau Ingestion >Folder->Workbook & datamodels
thanks, Carlos
Here a recap on the conversation that I had with @OnkarVO7 .
- the query, must be changed to add this section:
upstreamDatasources { id luid name description hasExtracts tags { id } fields { id name isHidden } upstreamTables { id luid name fullName schema referencedByQueries { id name query } columns { id name } database { id name } } }
- the ingestion must have a logic to use the new field upstreamDatasources. If the upstreamDatasources is not empty (that's the case of publishedDatasources) we need to publish a new data model node and link it to the underyling data-source downstream and upstream to the related data-model in tableau.