server-client-python icon indicating copy to clipboard operation
server-client-python copied to clipboard

Getting on-server datasource ID from workbook connections

Open thesketh opened this issue 4 years ago • 4 comments

Describe the bug Not sure if this is a feature request or if I'm missing something obvious.

I'm trying to refresh all the datasources connected to a workbook. Our datasources are Athena connections (with extract enabled) which have been published to the server, and the workbook is connected to these on-server datasources. I want to use the API to get the datasource items from the workbook's connections. This can be done from the GUI in Tableau server - there are links in the "Data Sources" tab which take you from workbook page to datasource page.

Unfortunately, there doesn't seem to be any way to do this with TSC. I can get the workbook's connections, but these are sqlproxy connections, which don't seem to have any link to the actual datasource.

Versions Details of your environment, including:

  • Tableau Server version: v2020.1
  • Python version: 3.7 or 3.8
  • TSC library version: 0.15.0

To Reproduce Steps to reproduce the behavior. Please include a code snippet where possible.

# Create server, sign in...
# Get the workbook we're interested in.
wb = server.workbooks.get_by_id(workbook_id)
# This is the actual on-server datasource the workbook is connected to.
ds = server.datasources.get_by_id(datasource_id)
print(ds.datasource_type)  # 'athena'

# Populate the workbook's connections, get the connection.
server.workbooks.populate_connections(wb)
c = wb.connections[0]

# Get the datasource the connection points at.
ds2 = server.datasources.get_by_id(c.datasource_id)
print(ds2.datasource_type)  # 'sqlproxy'

# This is not the same as the datasource I want to refresh. Refreshing fails.
print(ds.id == ds2.id)  # False

thesketh avatar Mar 24 '21 10:03 thesketh

I've been able to do this by parsing the datasource names from the workbook XML, and then matching these with the datasources from the server. Quick explanation:


Tableau workbooks (.twb files) are XML documents, which contain references to the datasources on the server:
<workbook>
  ...
  <datasources>
    <datasource>
      <repository-location id='datasource-name-here' .../>
    </datasource>
    ...
  </datasources>
  ...
</workbook>

If there are two datasources with the same name on-server (e.g. in different projects), these will have a 14 character numerical ID as a suffix:

datasource-name-here
datasource-name-here_01759868783642

I parsed these with xml.etree.ElementTree. To link the datasources from the server to these XML names, I used the following function to format the on-server datasource names (TSC.DatasourceItem.name, which I obtained by looping through all the datasources). The regex pattern was obtained by observing the behaviour of our datasource names - I may have missed some different behaviour for other characters:

def format_source_name(name: str) -> str:
    """Format a datasource name to match the name from the XML."""
    return re.sub(r"[^A-Za-z0-9_\-\.]", "", name).replace(".", "_")

If these matched with the XML name (except for the ID suffix), I then downloaded the datasource and parsed the XML to ensure that the ID matched:

<datasource>
  <repository-location id='datasource-name-here_01759868783642' .../>
  ... 
</datasource>

So, in order to identify the datasources connected to a workbook:

  1. Download the workbook's XML (this might be a zip file (.TWBX), so it may be necessary to extract this to obtain the .TWB XML file)
  2. Parse the datasource names (and possibly IDs) from the XML file.
  3. Loop through the on-server datasources and identify matches by formatting the datasource names using format_source_name.
  4. Where the formatted name matches the name from the workbook XML file (ignoring the ID), download the datasource's XML (this might be a zip file (.TDSX), so it may be necessary to extract this to obtain the .TDS XML file)
  5. Ensure that the name from the repository-location tag (including the ID) matches the workbook.

thesketh avatar Mar 29 '21 12:03 thesketh

This is obviously non-optimal, a better solution would be very appreciated!

thesketh avatar Mar 29 '21 12:03 thesketh

Hi there, I encountered exactly the same issue, any new progress? Thank you!!!

siyingwang-shc avatar Jul 15 '21 03:07 siyingwang-shc

The best way I know to do this with TSC is to use Tableau's Metadata API and throw the GraphQL query string into the metadata endpoint in TSC. Before the Metadata API, I used to parse the XML as well, but this is where the Metadata API shines!

graph_query = '''{
  workbooks {
    name
    luid
    upstreamDatasources {
      luid
      name
      __typename
    }
  }
}'''

with server.auth.sign_in(auth):
    wbs = server.metadata.query(graph_query)['data']['workbooks']

    for wb in wbs:
        for ds in wb['upstreamDatasources']:
            print(f"{ds['__typename']} datasource: {ds['name']} is referenced by workbook: {wb['name']}.")

jharris126 avatar Sep 06 '21 18:09 jharris126