server-client-python
server-client-python copied to clipboard
Support for concurrent sessions
Hi, I'm using the client to refresh multiple extracts via Airflow, the issue is that at a point multiple executions use the client and attempt to refresh an extract. When you sign in to the server, the concurrent run token stops working and then the process fails.
Is there any way to have concurrent server connections up and running?
To replicate this issue you can open a Python console and then:
import tableauserverclient as tsc
tableau_auth = tsc.PersonalAccessTokenAuth(token_name, access_token)
server = tsc.Server(tableau_host, use_server_version=True)
server.auth.sign_in_with_personal_access_token(tableau_auth)
resp = server.datasources.get()
Then run the code again on a new console, go back to the first console and try to run the following line only:
resp = server.datasources.get()
You'll get the following error:
401002: Unauthorized Access Invalid authentication credentials were provided.
Thanks!
Hi @gtcourreges, this is an expected behavior when using personal access tokens. It only allows one session per access token, so you would need to create and use one personal access token per process.
Another option would be to authenticate using username/password, which will not invalidate previous sessions upon signing in.
Hi @gtcourreges, this is an expected behavior when using personal access tokens. It only allows one session per access token, so you would need to create and use one personal access token per process.
Another option would be to authenticate using username/password, which will not invalidate previous sessions upon signing in.
Hi @shinchris , thanks for the reply.
Using user/password it's not an option due to security vulnerabilities. I've tried to generate multiple personal access tokens, but when I input the name and then click on 'Create new token' it does not work. I guess I'll have to contact Tableau support then.
Thanks for the help!
It only allows one session per access token, so you would need to create and use one personal access token per process.
Why was it designed this way though? This causes an insane amount of work when trying to run a web server that calls Tableau APIs. Using multiple tokens does not solve this still, because there is no easy way to distribute them among server worker processes (and in my specific case, threads that are spawned by these workers).
At this point I am considering writing a microservice just to proxy requests and avoid concurrent token usage. Maybe I am missing something here, but I have also never encountered any other API with this kind of limitation, especially since the personal access tokens (essentially API keys) are exchanged with standard timed tokens. Why are these tokens then linked to sessions at all?
@saulbein: It just allows for tight session control for Admins, which is a very popular request. With the current design, if a script misbehaves, it is possible to look at the logs, tie that session back to its originating PAT, and go ahead and revoke it to stop the issue.
I'm trying to understand what your limitation is though. A PAT can be exchanged for only one access token on a given site at a time, but that access token can be used to perform any number of concurrent operations on that site. And if you want to have different scripts running, you can assign a new PAT to each script, which is kind of the goal of the design, to have fine tuning of your automation and what it is doing. If you were to use the same PAT across different scripts you won't be able to track what each one of them is doing.
It is a pretty common pattern to have a pool of available connections for workers to grab and use, instead of creating and dropping connections on the fly all the time. Threads should be able to share the timed tokens, or am I missing something?
@gtcourreges you should be able to create multiple PATs, it's a standard use case. By default I think the limit is 10, after that you would probably have to revoke one of the existing ones to replace it with a new one, but this should be configurable by your server admin if you are in OnPrem and want to allow users to create more than 10.
The problem is that there could be any number of connections going on and there is no way to keep track of the PAT usage.
@fpagliar perhaps you could create a sample that shows the 'pool of tokens' method?
@gtcourreges you should be able to create multiple PATs, it's a standard use case. By default I think the limit is 10, after that you would probably have to revoke one of the existing ones to replace it with a new one, but this should be configurable by your server admin if you are in OnPrem and want to allow users to create more than 10.
@fpagliar can you share a reference for configuring on prem server to allow more than 10 PATs?
@fpagliar the problem is with multiple instances of the same code executing concurrently within a web service. My use case at the time involved refreshing, downloading and processing a report, with possibility of it running on multiple reports at the same time (thus needing multiple tokens for multiple workers).
@jacalata the way I worked around this was by implementing a token pool in a database, so that each worker takes tokens when needed and locks them for the duration of processing, releasing when done. For safety it makes sense to lock the tokens for a specific duration so that crashes don't permanently lock tokens.
You could manage multiple PATs using python dataclasses. This effectively creates a "pool of PATs" which can lock PATs while in use and return a currently available PAT (i.e. not currently locked). If you use this dataclass design with a singleton Tableau client class, you can share this PAT pool across many instances of a Tableau client class (i.e. in cases where you are instantiating a tableau handler concurrently using an orchestration tool such as airflow).
from dataclasses import dataclass, field
from collections import OrderedDict
@dataclass
class TableauToken:
pat_name: str
pat_value: str
locked: bool = False
@dataclass
class TableauTokenPool:
pool: OrderedDict[str, TableauToken] = field(default_factory=OrderedDict)
@classmethod
def from_dict(cls, tokens: List[Dict]):
"""Create new `TableauToken` dataclass from list of dictionaries."""
token_pool = cls()
for token in tokens:
token = TableauToken(**token)
getattr(token_pool, 'pool')[token.pat_name] = token
return token_pool
def set_pat_lock(self, pat_name: str, locked: bool):
"""Sets the lock status, given a pat name in the pool"""
self.pool[pat_name].locked = locked
def get_available_pat(self):
"""Returns an available TableauToken"""
for token in self.pool:
if not self.pool[token].locked:
self.pool.move_to_end(token)
return self.pool[token]
# return None if no pat is currently available
return None
Just create multiple Tableau Server PATs (across one or more users), store them in a list of dictionaries, and then initialize this dataclass by passing in the list of PATs.
pat_pool = TableauTokenPool.from_dict([
{'pat_name': 'test-pat-name-1', 'pat_value': 'test-pat-value-1'},
{'pat_name': 'test-pat-name-2', 'pat_value': 'test-pat-value-2'},
{'pat_name': 'test-pat-name-3', 'pat_value': 'test-pat-value-3'},
])
Using an ordered dictionary here for the pool allows you to shift the order of the PATs in the dictionary. This allows a natural cycling of PATs that get used by the pool. This ensures all PATs get used and help if you have any Tableau Server policies that deactivate PATs after x number of days being unused.
In our use case it was an asynchronous process that runs in a clustered environment and ran on multiple threads that would invalidate each other. I ended up having to store the encrypted token in a database and all processes reuse it and refresh it in intervals after acquiring a PESSIMISTIC_WRITE lock so that only one process did it. It was painful and not my first choice in terms of solutioning, but I was limited to use only the tools of the my Java service that used JPA, hibernate and postgres and ran on a cluster of 9 nodes and the asynchronous process was kicked off by JMS which also runs in a cluster.
It was either that or a wrapper service that did not run in a clustered environment that did the reuse/refresh. Ideally it would be great to see Tableau itself provide something like this out of the box
Our use case runs hundreds of asynchronous jobs throughout the day. There are many times when different processes would need to authenticate using a PAT. This is getting increasingly difficult to manage, all because server admins wanted a token associated with a log message. We call many other API endpoints in our implementation, and Tableau API is the only one with this limitation.
Is the team looking into any alternative approaches that can help with this problem?
This issue just became urgent since Tableau is enforcing MFA on password authenticated accounts.
PATs are useless for Airflow if they only supports a single connection at a time.
So we need to make separate PATs for all possible processes and all possible threads of those processes?
Edit Or we have to create a pool and then create a system for us to know globally, across all our services that touch this, which ones are currently being in use.
Not supporting concurrent access with an access token is simply bad API design.
It forces users to work around it by essentially building their own client-side stateful session management for anything non-trivial, or alternatively to use high-risk credentials instead of tokens. Having an API design that encourages users to send admin passwords over the wire is terrible security design.
You'll note that the Airflow connector docs for Tableau basically state "don't bother with PATs, they're useless for workflows".
End this madness now and make PATs actually useful.
After struggling with using PATs concurrently across multiple pods to refresh workbooks, I implemented a naive workaround. I thought it would be helpful to share it here.
It is basically a tableauserverclient wrapper leveraging a pool of PATs. Each client method authenticates using a random PAT from the pool and has a @retry
decorator that calls the function again if the concurrent PAT access error is raised.
Pretty nasty but works ...
import random
import time
from functools import wraps
from typing import Any, Callable
import tableauserverclient as tsc
from tableauserverclient import JobItem, Server, ServerResponseError, WorkbookItem
from tableauserverclient.exponential_backoff import ExponentialBackoffTimer
from tableauserverclient.server.endpoint.exceptions import InternalServerError, JobFailedException, \
JobCancelledException
def retry(func: Callable) -> Callable:
"""Decorator to run the function again if Tableau client throws an unexpected error and discard irrelevant ones.
Args:
func: Decorated function to be wrapped.
Returns:
Wrapped function.
"""
@wraps(func)
def wrapper(*args: tuple[Any], **kwargs: dict[str, Any]) -> Any:
while True:
try:
return func(*args, **kwargs)
except ServerResponseError as e:
# Concurrent PAT usage
if e.code == "401002":
continue
# Refresh job already queued
if e.code == "409093":
break
raise Exception(e.code, e.summary, e.detail)
# Unresponsive Tableau API
except InternalServerError:
time.sleep(60)
continue
except Exception as e:
raise e
return wrapper
class TableauBruteClient:
"""Wrapper around Tableau server client lib to address lack of support for concurrent sessions using the same PAT.
All methods of this class get an authenticated Tableau API client (Server) using the `_get_server()` method.
They are decorated using the `@retry` decorator which runs the function again if the Server throws an exception.
"""
def __init__(self, pat_pool: dict[str, str], server_url: str):
self._auth_pool = [tsc.PersonalAccessTokenAuth(name, pat) for name, pat in pat_pool.items()]
self.server_url = server_url
def _get_server(self) -> Server:
server = tsc.Server(self.server_url, use_server_version=True)
server.auth.sign_in(random.choice(self._auth_pool))
return server
@retry
def get_workbooks(self) -> list[WorkbookItem]:
server = self._get_server()
return list(tsc.Pager(server.workbooks))
@retry
def get_workbook(self, workbook_id: str) -> WorkbookItem:
server = self._get_server()
return server.workbooks.get_by_id(workbook_id)
@retry
def trigger_refresh(self, workbook: WorkbookItem) -> JobItem:
server = self._get_server()
return server.workbooks.refresh(workbook)
@retry
def get_job(self, job_id: str) -> JobItem:
server = self._get_server()
return server.jobs.get_by_id(job_id)
def wait_for_refresh(self, job_id: str) -> JobItem:
job = self.get_job(job_id)
backoff_timer = ExponentialBackoffTimer()
while job.completed_at is None:
backoff_timer.sleep()
job = self.get_job(job_id)
if job.finish_code == JobItem.FinishCode.Success:
return job
elif job.finish_code == JobItem.FinishCode.Failed:
raise JobFailedException(job)
elif job.finish_code == JobItem.FinishCode.Cancelled:
raise JobCancelledException(job)
else:
raise AssertionError("Unexpected finish_code in job", job)
def refresh_workbook(self, workbook_id: str) -> JobItem:
workbook = self.get_workbook(workbook_id)
job_item = self.trigger_refresh(workbook)
return self.wait_for_refresh(job_item.id)
With the release of v0.28 auth with JWTs have been added, from my testing with Tableau Cloud I found that this method does support concurrent connections. I have been using this to publish multiple workbooks concurrently and it is much simpler than using a pool of PATs.
However JWT Auth doesn't support site admin methods like updating users while PATs do. Methods supported are listed here.
We can't address this in TSC, but we can add a note to the docs on what the limitations/workarounds are.