iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

ICEBERG performance is slow when querying tables with a large number of partitions.

Open BsoBird opened this issue 2 years ago • 3 comments

Query engine

spark 3.3.2 iceberg 1.3.1

Question

I've got a table of 11 billion or so. 3 terabytes. This table currently has about 400,000 partitions. The MetaData file size is 300MB. I'm currently experiencing the following problems:

  1. When I query the table, no matter what type of query I submit, the SQL takes a long time to commit.
  2. When I need to retrieve a large range of partitions, the query performance of this table is very poor. Any suggestions for my situation?

BsoBird avatar Jul 27 '23 03:07 BsoBird

Have less partitions. Each partition is more File/S3 I/O.

You have 27,500 rows per partition which is really small. Try to target at least a few million rows per partition depending on row size.

Rusty

rustyconover avatar Aug 10 '23 02:08 rustyconover

Rusty is right here, that’s only 7.5 mb a partition. I would aim for at least 512mb maybe more for such a large table

On Wed, Aug 9, 2023 at 10:00 PM Rusty Conover @.***> wrote:

Have less partitions. Each partition is more File/S3 I/O.

You have 27,500 rows per partition which is really small. Try to target at least a few million rows per partition depending on row size.

Rusty

— Reply to this email directly, view it on GitHub https://github.com/apache/iceberg/issues/8161#issuecomment-1672478848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADE2YMETHMHQYNHBEIMFP3XURFDDANCNFSM6AAAAAA2ZOK7FA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

RussellSpitzer avatar Aug 10 '23 03:08 RussellSpitzer

Is this Big or Small data technology? Let us say table size is 300TB. With 400K partitions, average partition size is 750MB, which looks normal to me. 3PB tables? I know they exist. I think that the problem of large number of partitions, large size of the metadata must be eventually addressed by Iceberg community.

VladRodionov avatar Aug 27 '24 18:08 VladRodionov

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Feb 24 '25 00:02 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Mar 11 '25 00:03 github-actions[bot]