amazon-redshift-python-driver icon indicating copy to clipboard operation
amazon-redshift-python-driver copied to clipboard

Execute .SQL files

Open maddy2u opened this issue 4 years ago • 8 comments

I want to be able to execute full .SQL files by passing them through the driver. How can i do that?

maddy2u avatar Jun 19 '21 23:06 maddy2u

Hi @maddy2u ,

The driver does not support that functionality at this time, so some pre-proccessing would need to occur in your script before passing the individual statements to the driver. Something like the code snippit below could work:

import redshift_connector
import os

MY_FILE: str = os.environ.get('SQL_SMTS', None)

def preprocess_sql_file(filepath: str) -> list[str]:
   """Returns a list of individual SQL statements"""
  raise NotImplementedError

with redshift_connector.connect(...) as conn:
   with conn.cursor() as cursor:
      for stmt in preprocess_sql_file(MY_FILE):
         cursor.execute(stmt)
   conn.commit() # optionally, commit changes

I will chat with the team to see if we can add support for executing .SQL files directly and update this issue with my findings.

Brooke-white avatar Jun 21 '21 16:06 Brooke-white

Thanks. Any update on the above. Would be a good functionality to have inbuilt.

maddy2u avatar Jun 25 '21 20:06 maddy2u

Hey @maddy2u ,

To support executing .SQL files directly we would need to re-work how the driver sends statements to the Amazon Redshift server. Currently, prepared statements are used in all cases-- which prevents us from supporting mult-statements such as:

cursor.execute("SELECT 1; SELECT 2; SELECT 3;")

this also prevents us from supporting .SQL file execution. We would need to move from using prepared statements to statements to support this.

I'm working to gauge how much re-work this change would require and get an idea about how this would impact the drivers performance.

Brooke-white avatar Jun 25 '21 20:06 Brooke-white

Makes sense. Looking forward to your response @Brooke-white ! Thank you.

maddy2u avatar Jun 26 '21 19:06 maddy2u

Hi @maddy2u ,

At this time we do not plan to add functionality to execute .SQL files using redshift_connector. Parsing SQL is a big undertaking -- enough so that an entire project, sqlparse, has been dedicated to it.

We are continuing to scope work for supporting multi-statement execution.

Here's an example of how sqlparse could be used for executing .SQL files with redshift_connector. Please see the documentation for sqlparse here for more details.

import redshift_connector
import sqlparse

with open('data.sql', 'r') as file:
    raw_sql_file = file.read()
    sql_stmts = sqlparse.split(raw_sql_file)

    with redshift_connector.connect(...) as conn:
        with conn.cursor() as cursor:
            for stmt in sql_stmts:
                cursor.execute(stmt)

Brooke-white avatar Jun 28 '21 22:06 Brooke-white

@Brooke-white thanks for advising a workaround. It'll be nice to support multiple statements in a single execute call. I notice psycopg allow me to do this, but I was not able to identify exactly which line from its source code provides such support.

liuming avatar Sep 07 '21 21:09 liuming

HI @liuming ,

No problem -- yes I agree it'd be much cleaner (and simpler) to support this in a single call to execute.

I believe psycopg supports this due to the approach taken to executing the statement. There are two approaches a driver can take to execute a statement -- simple query cycle and extended query cycle. Simple query cycle supports executing multiple statements, while extended query protocol does not. redshift_connector uses extended query protocol, which is the root cause for our lack of support. If you're interested, the above links give an interesting overview of the two approaches and their trade offs.

When I have extra time I am working to add support for simple query cycle (i.e. multiple statement execution), but there's no timeline on when this would be available as we are prioritizing other tasks at this time. Ideally, I'd like to provide support for both extended and simple query cycles so users have the option to fine tune how statements are executed :)

Brooke-white avatar Sep 08 '21 16:09 Brooke-white

@Brooke-white I really appreciate you took the time to link these documentations and also provide a surprising detailed reply. I learn something new today. Thank you. :clap:

liuming avatar Sep 15 '21 00:09 liuming

closing this out as we won't be supporting direct execution of SQL files, or multi-statements in the short term

Brooke-white avatar Feb 27 '23 17:02 Brooke-white