aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

redshift.unload_to_files doesn't escape sql query

Open mariokostelac opened this issue 3 years ago • 5 comments

Describe the bug

As the code shows in https://github.com/awslabs/aws-data-wrangler/blob/ad06fab6e967618d930595bbebaa1ae0ab906a87/awswrangler/redshift.py#L957, the library is not escaping the query while creating a parent UNLOAD query.

Environment

N/A.

To Reproduce

In [5]: wr.redshift.unload_to_files(
   ...:     "SELECT * FROM rd.events WHERE action = 'views' LIMIT 1",
   ...:     path='s3://some-bucket/some-path/',
   ...:     unload_format='PARQUET',
   ...:     iam_role='arn:aws:iam::12312452335:role/somerole',
   ...:     con=redshift_con,
   ...: )
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/core.py in execute(self,cursor, operation, vals)
   1123         try:
-> 1124             ps = cache["ps"][key]
   1125             cursor.ps = ps

KeyError: ("UNLOAD ('SELECT * FROM rd.events WHERE action = 'views' LIMIT 1')\nTO 's3://some-bucket/some-path/'\nIAM_ROLE 'arn:aws:iam::12312452335:role/somerole'\nALLOWOVERWRITE\nPARALLEL ON\nFORMAT PARQUET\nENCRYPTED;", ())

During handling of the above exception, another exception occurred:

ProgrammingError                          Traceback (most recent call last)
<ipython-input-5-f928843ac489> in <module>
      4     unload_format='PARQUET',
      5     iam_role='arn:aws:iam::12312452335:role/somerole',
----> 6     con=redshift_con,
      7 )

~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/awswrangler/redshift.py in unload_to_files(sql, path, con, iam_role, aws_access_key_id, aws_secret_access_key, aws_session_token, region, unload_format, max_file_size, kms_key_id, manifest, partition_cols, boto3_session)
    969         )
    970         _logger.debug("sql: \n%s", sql)
--> 971         cursor.execute(sql)
    972
    973

~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/cursor.py in execute(self, operation, args, stream, merge_socket_read)
    209                 self._c.execute(self, "begin transaction", None)
    210             self._c.merge_socket_read = merge_socket_read
--> 211             self._c.execute(self, operation, args)
    212         except AttributeError as e:
    213             raise e

~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/core.py in execute(self,cursor, operation, vals)
   1192                     raise e
   1193
-> 1194             self.handle_messages(cursor)
   1195
   1196             # We've got row_desc that allows us to identify what we're

~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/core.py in handle_messages(self, cursor)
   1356
   1357         if self.error is not None:
-> 1358             raise self.error
   1359
   1360     def handle_messages_merge_socket_read(self: "Connection", cursor: Cursor):

ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "views"', 'P': '50', 'F': '/home/ec2-user/padb/src/pg/src/backend/parser/parser_scan.l', 'L': '719', 'R': 'yyerror'}

It's visible that the formed query KeyError: ("UNLOAD ('SELECT * FROM rd.events WHERE action = 'views' LIMIT 1')\nTO 's3://some-bucket/some-path/'\nIAM_ROLE 'arn:aws:iam::12312452335:role/somerole'\nALLOWOVERWRITE\nPARALLEL ON\nFORMAT PARQUET\nENCRYPTED;", ()) is not valid.

Given that I wasn't able to find guidelines for contribution, I won't create a PR, but the fix is to escape incoming sql with

sql.replace(r"'", r"\'")

mariokostelac avatar Jan 14 '22 10:01 mariokostelac

Hi @mariokostelac, you can find contributing guidelines are in CONTRIBUTING.md in the root of the repo. Contributions like this one are very welcome!

kukushking avatar Jan 14 '22 17:01 kukushking

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

github-actions[bot] avatar May 07 '22 18:05 github-actions[bot]

keeping this issue open - we're working on a solution, although there is no ETA.

cnfait avatar May 09 '22 08:05 cnfait

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

github-actions[bot] avatar Jul 08 '22 09:07 github-actions[bot]

Keeping open, will make sure workflow ignores assigned issues.

malachi-constant avatar Jul 08 '22 15:07 malachi-constant

@mariokostelac @jaidisido

Solution: wr.redshift.unload_to_files( "SELECT * FROM rd.events WHERE action = \''views\'' LIMIT 1", path='s3://some-bucket/some-path/', unload_format='PARQUET', iam_role='arn:aws:iam::12312452335:role/somerole', con=redshift_con )

reference document: https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html

mangeshingle avatar May 02 '23 08:05 mangeshingle