iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Action: support spark3 and customer catalog

Open liukun4515 opened this issue 4 years ago • 9 comments

In the spark2.4, the action uses the interface spark.read().format("iceberg").load(table) to load the table as the dataset. But It just can load the default spark catalog(spark_catalog) table, and the table is db.tablename. If I want to use the action to handle some tables which are in the other iceberg customer catalog, there will throw an exception referred to the #1652.

liukun4515 avatar Oct 26 '20 08:10 liukun4515

When you deal with DataFrameWriterV1 you may want to "forget about" catalog, and also want to "double-check about" how the "path" is interpreted for the format (data source).

Btw what's the use case we would like to touch some tables in other catalog?

HeartSaVioR avatar Oct 26 '20 13:10 HeartSaVioR

I thought you could still load it directly from the path (ignoring the other catalogs?)

RussellSpitzer avatar Oct 26 '20 14:10 RussellSpitzer

I think I understand the point now,

We want to invoke an action on a table whose identifier cannot be read using the current logic in BaseAction to generate metadata tables.

RussellSpitzer avatar Oct 26 '20 21:10 RussellSpitzer

I think I understand the point now,

We want to invoke an action on a table whose identifier cannot be read using the current logic in BaseAction to generate metadata tables.

@RussellSpitzer

I'm working on the SQL extension(remove orphan files), and want to use the spark action to implement it. But the current actions just support spark2.

liukun4515 avatar Oct 27 '20 02:10 liukun4515

Yep we'll have to make some modifications. I think probably a first step is just only making the extensions work for the default catalog. But otherwise we need to start extending some of the methods in the base class so they properly handle multi-part identifiers and we determine how to load up metadata tables properly for tables in other catalogs.

RussellSpitzer avatar Oct 27 '20 14:10 RussellSpitzer

Yep we'll have to make some modifications. I think probably a first step is just only making the extensions work for the default catalog. But otherwise we need to start extending some of the methods in the base class so they properly handle multi-part identifiers and we determine how to load up metadata tables properly for tables in other catalogs.

There are pull requests about the modifications?

liukun4515 avatar Oct 29 '20 08:10 liukun4515

Not yet, but obviously we will need to make some :)

RussellSpitzer avatar Oct 29 '20 14:10 RussellSpitzer

Is this #1525 pr is about supporting spark3 action? @RussellSpitzer

liukun4515 avatar Oct 30 '20 09:10 liukun4515

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Feb 28 '24 00:02 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Mar 13 '24 00:03 github-actions[bot]