aws-sdk-pandas
aws-sdk-pandas copied to clipboard
Improve chunk size parameter validation for PostgreSQL queries
Hi,
I'm working with an Aurora PostgreSQL database - and I had to remove the "pg8000_native.literal" from this line to properly work with chunks:
https://github.com/aws/aws-sdk-pandas/blob/602e8a78d6560ea16be1f342061c0e1493327df4/awswrangler/postgresql.py#L106
If I understand the documentation here ( https://www.postgresql.org/docs/current/sql-fetch.html ) correctly, the pg8000_native.literal should not be there.
Hey,
Can you please describe the error you are getting?
The pg8000_native.literal here basically only makes sure that an int gets displayed as an int. The main reason for its usage is just to prevent SQL injection -- if an attacker passes an attack command as chunksize, this usage of pg8000_native.literal will prevent the command from executing.
Best regards, Leon
Ok, I think I found the issue.
The chunk size was taken from some data source, which had the result type "numpy.int64" instead of "int".
Unfortunately, pg8000_native.literal will turn data with that type into a quoted string:
My suggestion would be to perhaps check the type of chunksize in the function and verify it's not just int-like but actually an integer. numpy.int64 is probably common enough in the awswrangler environment that it is warranted to do some input validation here.
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.