aws-sdk-pandas
aws-sdk-pandas copied to clipboard
redshift.unload_to_files doesn't escape sql query
Describe the bug
As the code shows in https://github.com/awslabs/aws-data-wrangler/blob/ad06fab6e967618d930595bbebaa1ae0ab906a87/awswrangler/redshift.py#L957, the library is not escaping the query while creating a parent UNLOAD query.
Environment
N/A.
To Reproduce
In [5]: wr.redshift.unload_to_files(
...: "SELECT * FROM rd.events WHERE action = 'views' LIMIT 1",
...: path='s3://some-bucket/some-path/',
...: unload_format='PARQUET',
...: iam_role='arn:aws:iam::12312452335:role/somerole',
...: con=redshift_con,
...: )
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/core.py in execute(self,cursor, operation, vals)
1123 try:
-> 1124 ps = cache["ps"][key]
1125 cursor.ps = ps
KeyError: ("UNLOAD ('SELECT * FROM rd.events WHERE action = 'views' LIMIT 1')\nTO 's3://some-bucket/some-path/'\nIAM_ROLE 'arn:aws:iam::12312452335:role/somerole'\nALLOWOVERWRITE\nPARALLEL ON\nFORMAT PARQUET\nENCRYPTED;", ())
During handling of the above exception, another exception occurred:
ProgrammingError Traceback (most recent call last)
<ipython-input-5-f928843ac489> in <module>
4 unload_format='PARQUET',
5 iam_role='arn:aws:iam::12312452335:role/somerole',
----> 6 con=redshift_con,
7 )
~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/awswrangler/redshift.py in unload_to_files(sql, path, con, iam_role, aws_access_key_id, aws_secret_access_key, aws_session_token, region, unload_format, max_file_size, kms_key_id, manifest, partition_cols, boto3_session)
969 )
970 _logger.debug("sql: \n%s", sql)
--> 971 cursor.execute(sql)
972
973
~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/cursor.py in execute(self, operation, args, stream, merge_socket_read)
209 self._c.execute(self, "begin transaction", None)
210 self._c.merge_socket_read = merge_socket_read
--> 211 self._c.execute(self, operation, args)
212 except AttributeError as e:
213 raise e
~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/core.py in execute(self,cursor, operation, vals)
1192 raise e
1193
-> 1194 self.handle_messages(cursor)
1195
1196 # We've got row_desc that allows us to identify what we're
~/SageMaker/persisted_conda_envs/intercom_python37/lib/python3.7/site-packages/redshift_connector/core.py in handle_messages(self, cursor)
1356
1357 if self.error is not None:
-> 1358 raise self.error
1359
1360 def handle_messages_merge_socket_read(self: "Connection", cursor: Cursor):
ProgrammingError: {'S': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "views"', 'P': '50', 'F': '/home/ec2-user/padb/src/pg/src/backend/parser/parser_scan.l', 'L': '719', 'R': 'yyerror'}
It's visible that the formed query KeyError: ("UNLOAD ('SELECT * FROM rd.events WHERE action = 'views' LIMIT 1')\nTO 's3://some-bucket/some-path/'\nIAM_ROLE 'arn:aws:iam::12312452335:role/somerole'\nALLOWOVERWRITE\nPARALLEL ON\nFORMAT PARQUET\nENCRYPTED;", ()) is not valid.
Given that I wasn't able to find guidelines for contribution, I won't create a PR, but the fix is to escape incoming sql with
sql.replace(r"'", r"\'")
Hi @mariokostelac, you can find contributing guidelines are in CONTRIBUTING.md in the root of the repo. Contributions like this one are very welcome!
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.
keeping this issue open - we're working on a solution, although there is no ETA.
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.
Keeping open, will make sure workflow ignores assigned issues.
@mariokostelac @jaidisido
Solution:
wr.redshift.unload_to_files( "SELECT * FROM rd.events WHERE action = \''views\'' LIMIT 1", path='s3://some-bucket/some-path/', unload_format='PARQUET', iam_role='arn:aws:iam::12312452335:role/somerole', con=redshift_con )
reference document: https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html