databricks-sql-nodejs icon indicating copy to clipboard operation
databricks-sql-nodejs copied to clipboard

What objects can/should be shared/reused?

Open timtucker-dte opened this issue 2 years ago • 3 comments

Reading through and the documentation doesn't seem to be clear on how to handle long running processes that fetch data at intervals over time.

The "hello world" paradigm that we see here:

  1. Get a client
  2. Get a session from the client
  3. Perform an operation on the session
  4. Close the operation
  5. Close the session
  6. Close the client

If we have something like an http trigger in an Azure functions instance, what can be reused:

  • Can we share a client as a singleton, or do we need a new one for each http request that comes in and executes a query?
  • Can we share a session as a singleton, or do we need a new one for each http request that comes in and executes a query?

If resources can be used as singletons, what's the recommended error handling to make sure that they get cleaned up & reinitialized if there's an error?

The only guidance is just "After you finish working with the operation, session or client, it is better to close it, each of them has a respective method (close())."

The basic question here: at what point do we consider ourselves "finished" working with each object?

timtucker-dte avatar Aug 25 '23 19:08 timtucker-dte

Hi @timtucker-dte! So:

  1. client instance can be safely used as singleton. Probably the only advice here could be to have an instance per connection options set (host + path + auth creds - things you pass to client.connect()). Depending on your use case, you may call connect() many times on same client, passing different options, but you'd have to close client before second and subsequent connect calls.
  2. session could also be used as a singleton, but keep in mind that after some period of inactivity it will expire and you'll start getting error on any operation running against the session.

Both client and session objects should eventually be closed to free up resources they use. When you do this depends on your use case. Client instance itself doesn't expire, and can be used as long as you need. Sessions (as mentioned) will expire if not used. So if, for example, your code executes some queries frequently it makes sense to keep and re-use session object - it will save some time and network requests. If your script runs, say, every hour, - it's better to create a new session each time and close it when script finishes, so you don't have to deal with expired sessions.

Hope this clarifies things a bit, but feel free to ask if any questions

kravets-levko avatar Aug 28 '23 16:08 kravets-levko

Thanks!

For a session, how would we tell that it's expired? Is there an event that we should be listening for / an error that we can catch? Is there a good way to tell whether or not a session currently has operations pending?

Ideally what we're looking for is the equivalent of what we can get using connection pooling libraries with other SDKs, where most of the object management gets taken care of automatically behind the scenes.

timtucker-dte avatar Aug 28 '23 18:08 timtucker-dte

For a session, how would we tell that it's expired?

We cannot until we try to use it. It's a big problem for developers of all connectors, and we don't have anh solution for this so far.

Is there an event that we should be listening for / an error that we can catch?

I need to double-check, but if I recall correctly - you'll just get a HTTP 404 when trying to use an expired session

Is there a good way to tell whether or not a session currently has operations pending?

I'm curiour what's a use case for it?

kravets-levko avatar Apr 11 '24 14:04 kravets-levko