delta-sharing
delta-sharing copied to clipboard
Protocol spec - Read table data can have large responses for tables with many files.
In the Delta sharing protocol spec there doesn't appear to be any pagination for the:
{prefix}/shares/{share}/schemas/{schema}/tables/{table}/query
API endpoint.
I have some tables that contain 10,000+ files so if no predication conditions are applied the response of the API will be very large. If statistics per file are returned the total size for the response could be tens of megabytes.
Responses greater than 6mb in size can be a problem for various API gateways and AWS Lambda functions.
Would you please consider adding pagination into the API method so that Delta sharing could be processed efficiently without singular large responses to API calls?
Hi @rustyconover , yes pagination is in our roadmap. Feel free to make changes to the open source server if you are interested.
Where is the "6mb" from though? It's quite small.
Hi, @linzhou-db
6mb is the maximum response size of an AWS Lambda function.
You can see the limits here:
https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
The AWS Lambda limit is a good one to know. Thanks @rustyconover .
In InfluxDB, the quantity of Parquet files required to satisfy a query varies widely.
- the write pipeline creates many tiny (<100Kb) Parquet files
- the compaction pipeline rewrites those ^^ data as fewer large (>1GB) Parquet files
- most queries are served by a mix of both
- occasionally the compaction pipeline gets backed up, which has obvious consequences on that mix