server-client-python
server-client-python copied to clipboard
Getting on-server datasource ID from workbook connections
Describe the bug Not sure if this is a feature request or if I'm missing something obvious.
I'm trying to refresh all the datasources connected to a workbook. Our datasources are Athena connections (with extract enabled) which have been published to the server, and the workbook is connected to these on-server datasources. I want to use the API to get the datasource items from the workbook's connections. This can be done from the GUI in Tableau server - there are links in the "Data Sources" tab which take you from workbook page to datasource page.
Unfortunately, there doesn't seem to be any way to do this with TSC. I can get the workbook's connections, but these are sqlproxy connections, which don't seem to have any link to the actual datasource.
Versions Details of your environment, including:
- Tableau Server version: v2020.1
- Python version: 3.7 or 3.8
- TSC library version: 0.15.0
To Reproduce Steps to reproduce the behavior. Please include a code snippet where possible.
# Create server, sign in...
# Get the workbook we're interested in.
wb = server.workbooks.get_by_id(workbook_id)
# This is the actual on-server datasource the workbook is connected to.
ds = server.datasources.get_by_id(datasource_id)
print(ds.datasource_type) # 'athena'
# Populate the workbook's connections, get the connection.
server.workbooks.populate_connections(wb)
c = wb.connections[0]
# Get the datasource the connection points at.
ds2 = server.datasources.get_by_id(c.datasource_id)
print(ds2.datasource_type) # 'sqlproxy'
# This is not the same as the datasource I want to refresh. Refreshing fails.
print(ds.id == ds2.id) # False
I've been able to do this by parsing the datasource names from the workbook XML, and then matching these with the datasources from the server. Quick explanation:
Tableau workbooks (.twb files) are XML documents, which contain references to the datasources on the server:
<workbook>
...
<datasources>
<datasource>
<repository-location id='datasource-name-here' .../>
</datasource>
...
</datasources>
...
</workbook>
If there are two datasources with the same name on-server (e.g. in different projects), these will have a 14 character numerical ID as a suffix:
datasource-name-here
datasource-name-here_01759868783642
I parsed these with xml.etree.ElementTree. To link the datasources from the server to these XML names, I used the following function to format the on-server datasource names (TSC.DatasourceItem.name, which I obtained by looping through all the datasources). The regex pattern was obtained by observing the behaviour of our datasource names - I may have missed some different behaviour for other characters:
def format_source_name(name: str) -> str:
"""Format a datasource name to match the name from the XML."""
return re.sub(r"[^A-Za-z0-9_\-\.]", "", name).replace(".", "_")
If these matched with the XML name (except for the ID suffix), I then downloaded the datasource and parsed the XML to ensure that the ID matched:
<datasource>
<repository-location id='datasource-name-here_01759868783642' .../>
...
</datasource>
So, in order to identify the datasources connected to a workbook:
- Download the workbook's XML (this might be a zip file (.TWBX), so it may be necessary to extract this to obtain the .TWB XML file)
- Parse the datasource names (and possibly IDs) from the XML file.
- Loop through the on-server datasources and identify matches by formatting the datasource names using
format_source_name. - Where the formatted name matches the name from the workbook XML file (ignoring the ID), download the datasource's XML (this might be a zip file (.TDSX), so it may be necessary to extract this to obtain the .TDS XML file)
- Ensure that the name from the
repository-locationtag (including the ID) matches the workbook.
This is obviously non-optimal, a better solution would be very appreciated!
Hi there, I encountered exactly the same issue, any new progress? Thank you!!!
The best way I know to do this with TSC is to use Tableau's Metadata API and throw the GraphQL query string into the metadata endpoint in TSC. Before the Metadata API, I used to parse the XML as well, but this is where the Metadata API shines!
graph_query = '''{
workbooks {
name
luid
upstreamDatasources {
luid
name
__typename
}
}
}'''
with server.auth.sign_in(auth):
wbs = server.metadata.query(graph_query)['data']['workbooks']
for wb in wbs:
for ds in wb['upstreamDatasources']:
print(f"{ds['__typename']} datasource: {ds['name']} is referenced by workbook: {wb['name']}.")