aws-sdk-pandas
aws-sdk-pandas copied to clipboard
s3.read_parquet_table and exception "Unknown parameter in input: "ExcludeColumnSchema", must be one of: CatalogId, DatabaseName, TableName, Expression, NextToken, Segment, MaxResults"
Describe the bug
Exception
When I use awswrangler.s3.read_parquet_table with a partition filter I get this exception:
ParamValidationError: Parameter validation failed:
Unknown parameter in input: "ExcludeColumnSchema", must be one of: CatalogId, DatabaseName, TableName, Expression, NextToken, Segment, MaxResults
Related Source
When I uncomment the offending line in awswrangler/catalog/_get.py the error goes away, but I'm not sure that is an appropriate fix.
args: dict[str, Any] = _catalog_id(
catalog_id=catalog_id,
DatabaseName=database,
TableName=table,
MaxResults=1_000,
Segment={"SegmentNumber": 0, "TotalSegments": 1}
#ExcludeColumnSchema=True,
)
Versions
awswrangler 3.7.3 boto3 1.34.99 botocore 1.34.99
How to Reproduce
import awswrangler as wr
import boto3
partition_filter = lambda x: True if x["partition_1"] == "p1" and x["partition_2"] == "p2" else False
df = wr.s3.read_parquet_table(
table="table_name",
database="database_name",
boto3_session=boto3.Session(profile_name="profile_name"),
partition_filter=partition_filter
)
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.10.12
AWS SDK for pandas version
3.7.3
Additional context
No response
Are you sure you have the latest versions of boto3 and botocore installed? As you can see here https://github.com/aws/aws-sdk-pandas/issues/1404#issuecomment-1321171171, the parameter was introduced from 1.17.4
I saw that issue and double checked my versions before I posted. boto3.version & botocore.version both report 1.34.101 and awswrangler is 3.7.3. pip list has the same. I tested in two venvs because I thought I was making a mistake (and might still be).
I downgraded to 1.17.4 and get the same error.
I will dig deeper in the source when I have the time.
Closing due to inactivity