azure-kusto-python
azure-kusto-python copied to clipboard
Reusing the same requests.Session object when creating the KustoClient instance
Is your feature request related to a problem? Please describe.
This is related to #587 as that user and I work together. Our company requires all egress traffic to go through an HTTP forward proxy that requires authentication. For that authentication, mTLS with the proxy is strongly preferred by the company. For us, in Python, this is done by setting the cert, verify, and proxies keywords on a requests.Session ( or similar ) instance, as we have our own CA, can't use the system default cacert, etc.
We have found that in various places in azure-kusto-python ( specifically azure-kusto-data ), the Session is not being propagated to other clients, helpers, etc., and new instances of Session are created instead. The proxies value is usually carried forward, but we would like to have the option to use the same Session instance.
#588 helped with this, but after pulling in azure-kusto-data==5.0.4, we found another issue. Here, a generic Session instance is created after calling super().__init__() on the superclass.
In the superclass, an instance of AadHelper is created, but the original Session instance is not passed to the AadHelper instance because the initializer doesn't allow for it at present.
That instance of AadHelper is then, in our use case, creating an instance of ApplicationCertificateTokenProvider, which again does not provide an option to inject the original Session object.
Eventually, the ApplicationCertificateTokenProvider class is calling a class from the msal library ( as seen in pdb ) to create the msal_client.
The ConfidentialClientApplication class from msal is a subclass of ClientApplication which does allow the caller to supply an http client ( Session ), but we are not exposing that as the classes ( AadHelper and ApplicationCertificateTokenProvider ) are not making this an option at present. Finally, a call is being made to login.microsoftonline.com for login purposes but, as the Session is new and doesn't have our custom attributes, the call fails.
Describe the solution you'd like
We would like to be able to optionally inject the Session instance into KustoClient and the other classes ( AadHelper and ApplicationCertificateTokenProvider ).
This would make it so that:
- We can customize the Session as needed before all other classes init.
- The Session will follow the other classes.
- Not providing the session would have the same effect as before - a default
Sessioninstance will be created. Users can still callset_proxy(url)on theKustoClientinstance to use a proxy with basic auth or no auth, as theproxy_dictis typically forwarded through the other classes mentioned.
Describe alternatives you've considered N/A
Additional context
I made these changes in a local environment and it worked as expected. I also ran the existing E2E tests ( got a free database as seen in Contributing.md ) and all existing tests passed, as well as a new test that I wrote. I'm happy to put in a PR for this change if desired.
We do want to support it eventually, but right now it's not prioritized.
If you want to contribute - go ahead please!
Otherwise, as a workaround:
- Control the session object directly through
client._session - For auth, you can use
with_azure_token_credentialto provide a custom credential object configured how you'd like.
@AsafMah , thanks so much for your response.
Re: your first point, we have done / tried that, and that works for what was solved with #588, but in this particular case, a call is being made to login.microsoftonline.com before the KustoClient finishes initializing via a class in msal. So setting it after the KustoClient has been initialized won't work for this particular call.
Re: your second point, thanks! I'll make a mention of that to the user who created 587. I'm actually on a different team that doesn't actually use this library and am just trying to help out.
I'll likely put in a PR for this if time allows. Thanks!
@AsafMah, please see #591. Thank you!
There shouldn't be any network calls happening before running execute or query, if there are - that's a bug for us to fix.
Even in your PR I don't see a reason why passing the session object earlier would affect anything, as opposed to doing it after creating the client.
Can you give a sample code of it trying to do a web request before you can set it up?
Hi @AsafMah , thanks again for your response. My first post in this Issue walks through the code flow, as seen by following it via pdb. I can try again to make it more clear what is happening.
-
The
KustoClientclass callssuper().__init__()on the_KustoClientBasebase class it is inheriting from before theSessionobject is created. -
The
_KustoClientBaseclass is creating an instance theAadHelperclass. -
There is no way to pass the current version of the http client (
Session) to the_AadHelperclass via its constructor. -
In the flow our code uses,
_AadHelperis creating an instance ofApplicationCertificateTokenProvider, again with no way to pass a custom http client along. -
ApplicationCertificateTokenProvideris, via its indirect base classTokenProviderBase, calling the_init_impl()method, which createsself.msal_client. -
The
msallibrary client, calledConfidentialClientApplicationinherits from theClientApplicationclass.ClientApplicationtakes anhttp_clientas a parameter. If that parameter is not passed, it creates a new one. -
Back in
ApplicationCertificateTokenProvider's_init_impl()method, we are not passing anhttp_clientwhen we initself.msal_client. -
When
my_kusto_client.execute(...)is called, if there are no cached tokens, it tries to call tologin.microsoftonline.com. For our use case, this will not work as is. Due to a firewall, the call needs to go through a proxy that requires clients to use mTLS to authenticate. This cannot happen if a new http client is constructed, due to the fact that we have a custom CA, client certs, etc., and the default system cacert will see them as self signed. There is also no easy way for us, as a company, to add these custom certs to the existing system cacerts.
What does this mean?
It means that the http client, as constructed by the end user, is not being passed to the other classes. For our use case and with the current azure-kusto-python architecture, it needs to be passed to _AadHelper, then ApplicationCertificateTokenProvider, and then into the msal_client.
What needs to happen for this to work?
One of two things:
-
We create the custom
requests.Sessioninstance and pass it into theKustoClientand then modify internal code so that the http client "follows" and can be used by the client from themsallibrary. OR: -
At the very least, the
_AadHelper,ApplicationCertificateTokenProviderclasses need to have their__init__()methods modified to take an instance ofrequests.Session, and the_KustoClientBaseclass can then pass it as needed. It will go all the way to themsal_clientthat gets created and also passed in there. The end user can then do, as you've suggested,client._session.cert = ("foo", "bar"), etc.
What have you tried?
- pip installed
azure-kusto-datav5.0.4 and used it as is in the manner below ( somewhat simplified, of course ):
conn = KustoConnectionStringBuilder(self.cluster, self.client_id, self.pem_string, self.thumbprint, authority_id=self.tenant_id)
self.client = KustoClient(conn)
self.client.set_proxy(self.proxy_url)
self.client._session.proxies = {"http": self.proxy_url, "https": self.proxy_url}
self.client._session.cert = ("test.cert", "test.key")
self.client._session.verify = "/path/to/cacert.crt"
self.client.execute(".......")
When used as above, with no modifications to any source code from azure-kusto-data, this fails, as it tries to call login.microsoftonline.com and fails because of the reasons outlined above.
However, this same exact code above, when the modifications are made to the source code of azure-kusto-data, works as intended, and we get back the data we expect.
Of course, with the changes made in my PR, this also works:
session = requests.Session()
session.cert = ("foo", "bar")
session.verify = "/path/to/cacert.crt"
session.proxies = {"http": self.proxy_url, "https": self.proxy_url}
client = KustoClient(conn, session=session)
client.execute("....")
Why does the current version fail?
Because self.msal_client is created with a brand new http client and, when a call is made for authentication / tokens, it is using the brand new client with none of our options set on it.
It still works if you use self.set_proxy() - why?
Because the proxy_dict is passed along, even to the msal_client. So if someone sets their proxy url to http://foo:[email protected]:2345, the auth is already passed in via the basic auth headers. But our use case requires more customization.
I hope that helps to explain it. At the very least, whether it comes from my PR or not, in order for use cases like this to work, the http client needs to follow all the way to when the msal_client is created. My opinion is that allowing the client / end user to create the http client with any needed customizations and then inject it into KustoClient is a cleaner way to do it, but ultimately, what needs to be done for this use case to work is that whatever client is created, gets passed along to other helper classes and, ultimately, to the msal client class.
As always, thanks for your time. :)
You have two mistakes:
_init_implis not called in the ctor, it's only called lazily on execute, via the _init_once method. Which means, you always have time to modify the_aad_helperobject before it initializes.- Your code updates the _session object in the client, but as you know (since you wrote a pr for this), the client doesn't pass the object to the token provider.
You can right now handle this in two ways:
- Write your own token provider object using
with_token_provider, inside you can call whatever you want - Or, change your code to:
conn = KustoConnectionStringBuilder(self.cluster, self.client_id, self.pem_string, self.thumbprint, authority_id=self.tenant_id)
self.client = KustoClient(conn)
self.client.set_proxy(self.proxy_url)
self.client._session.proxies = {"http": self.proxy_url, "https": self.proxy_url}
self.client._session.cert = ("test.cert", "test.key")
self.client._session.verify = "/path/to/cacert.crt"
self.client._aad_helper._session = self.client._session
self.client.execute(".......")
The reason I'm apprehensive in inserting your PR, and also the previous PR, is that they are incomplete -
- Your PR only handles specifically CertificateTokenProvider, when really any CloudInfoTokenProvider should get the parameter
- It only handles the sync context and not the async context. That's on the current design - we use two different libraries for sync and async that aren't compatible, it is something we want to eventually change, and the effort can go after.
You have two mistakes:
_init_implis not called in the ctor, it's only called lazily on execute, via the _init_once method. Which means, you always have time to modify the_aad_helperobject before it initializes.- Your code updates the _session object in the client, but as you know (since you wrote a pr for this), the client doesn't pass the object to the token provider.
You can right now handle this in two ways:
- Write your own token provider object using
with_token_provider, inside you can call whatever you want- Or, change your code to:
conn = KustoConnectionStringBuilder(self.cluster, self.client_id, self.pem_string, self.thumbprint, authority_id=self.tenant_id) self.client = KustoClient(conn) self.client.set_proxy(self.proxy_url) self.client._session.proxies = {"http": self.proxy_url, "https": self.proxy_url} self.client._session.cert = ("test.cert", "test.key") self.client._session.verify = "/path/to/cacert.crt" self.client._aad_helper._session = self.client._session self.client.execute(".......") The reason I'm apprehensive in inserting your PR, and also the previous PR, is that they are incomplete -
- Your PR only handles specifically CertificateTokenProvider, when really any CloudInfoTokenProvider should get the parameter
- It only handles the sync context and not the async context. That's on the current design - we use two different libraries for sync and async that aren't compatible, it is something we want to eventually change, and the effort can go after.
Hey @AsafMah , thanks for another response. Re: _init_impl, yes, it is getting called on execute(). That's what I meant, but either way, once it's called, it's still not passing the original session to the client from the msal library. I've stepped through with pdb and seen this.
Re: your second point and the code snippet, I had already tried that approach in the past and it does not work, again because what really needs to happen is that the http client eventually gets passed to the msal client.
We can try writing our own token provider.
Re: the PR and only handling CertificateTokenProvider, yes, I thought about that, but thought that perhaps a smaller change was better to start.
No worries. Thanks for your time and discussion.