aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

Improve chunk size parameter validation for PostgreSQL queries

Open FredStober opened this issue 1 year ago • 2 comments

Hi,

I'm working with an Aurora PostgreSQL database - and I had to remove the "pg8000_native.literal" from this line to properly work with chunks:

https://github.com/aws/aws-sdk-pandas/blob/602e8a78d6560ea16be1f342061c0e1493327df4/awswrangler/postgresql.py#L106

If I understand the documentation here ( https://www.postgresql.org/docs/current/sql-fetch.html ) correctly, the pg8000_native.literal should not be there.

FredStober avatar Jan 24 '24 11:01 FredStober

Hey,

Can you please describe the error you are getting?

The pg8000_native.literal here basically only makes sure that an int gets displayed as an int. The main reason for its usage is just to prevent SQL injection -- if an attacker passes an attack command as chunksize, this usage of pg8000_native.literal will prevent the command from executing.

Best regards, Leon

LeonLuttenberger avatar Jan 24 '24 16:01 LeonLuttenberger

Ok, I think I found the issue.

The chunk size was taken from some data source, which had the result type "numpy.int64" instead of "int".

Unfortunately, pg8000_native.literal will turn data with that type into a quoted string: image

My suggestion would be to perhaps check the type of chunksize in the function and verify it's not just int-like but actually an integer. numpy.int64 is probably common enough in the awswrangler environment that it is warranted to do some input validation here.

FredStober avatar Jan 25 '24 13:01 FredStober

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

github-actions[bot] avatar Mar 31 '24 18:03 github-actions[bot]