iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Aws Glue error for append data

Open apersilva opened this issue 1 year ago β€’ 6 comments

Apache Iceberg version

0.6.0 (latest release)

Please describe the bug 🐞

A start use pyicerg with glue catalog and start error titulo The table in glue catalog have a comment column . ItΒ΄s possible to ignore comment table for append data in table ?

apersilva avatar May 14 '24 19:05 apersilva

Hello @apersilva, can you give us the error stack trace and a minimal code example that can reproduce this error?

ndrluis avatar May 14 '24 19:05 ndrluis

def update_table(database_target, table_target,database_name, table_name, partition_by,size, process_date, custom_partion):

catalog =load_catalog('glue', **{
        'type': 'glue', 'verify' : False
    })

tabela = catalog.load_table(f"{database_target}.{table_target}")

metadata = {}
for doc in tabela.metadata.schemas[0].columns:
    metadata.update({doc.name: f"({doc.doc})"})

df = pa.Table.from_pylist(
[
    {"nome_tabela": table_name, 
     "nome_base_dados": database_name, 
     "particao": partition_by, 
     "numero_registro": size, 
     "process_date": process_date, 
     "particao_customizada":  custom_partion,
     "data_criacao": datetime.now().date() }
],
metadata=metadata      
)

    
tabela.append(df)

apersilva avatar May 14 '24 20:05 apersilva

β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜Traceback (most recent call last): File "c:\great_teste\update_table.py", line 45, in update_table tabela.append(df) File "C:\Users\9001329\AppData\Roaming\Python\Python310\site-packages\pyiceberg\table_init_.py", line 1057, in append check_schema_compatible(self.schema(), other_schema=df.schema) File "C:\Users\9001329\AppData\Roaming\Python\Python310\site-packages\pyiceberg\table_init.py", line 175, in _check_schema_compatible raise ValueError(f"Mismatch in fields:\n{console.export_text()}") ValueError: Mismatch in fields: ┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ ┃ Table field ┃ Dataframe field ┃┑━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩│ ❌ β”‚ 1: nome_tabela: optional string (Nome data Tabela Processada) β”‚ 1: nome_tabela: optional string β”‚ β”‚ ❌ β”‚ 2: nome_base_dados: optional string (Nome do Banco de dados que pertence β”‚ 2: nome_base_dados: optional string β”‚ β”‚ β”‚ a tabela) β”‚ β”‚β”‚ ❌ β”‚ 3: particao: optional string (Nome da particao) β”‚ 3: particao: optional string β”‚ β”‚ ❌ β”‚ 4: numero_registro: optional long (Quantidade de registros) β”‚ 4: numero_registro: optional long β”‚ β”‚ ❌ β”‚ 5: process_date: optional string (parametro quando Γ© enviado e passo para β”‚ 5: process_date: optional string β”‚ β”‚ β”‚ a funcao de escrita para particao) β”‚ β”‚β”‚ ❌ β”‚ 6: particao_customizada: optional string (Indica que a partiΓ§Γ£o Γ© β”‚ 6: particao_customizada: optional string β”‚ β”‚ β”‚ diferente do padrΓ£o) β”‚ β”‚β”‚ ❌ β”‚ 7: data_criacao: optional date (Data em que foi inserido o registro) β”‚ 7: data_criacao: optional date β”‚ β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

apersilva avatar May 14 '24 20:05 apersilva

@Fokko, can you help with clarifying the expected behavior? I believe we should compare the representations (repr) of the objects. Currently, the doc attribute is not included in the __repr__, so changing the comparison to be between repr objects might solve this problem. What do you think?

ndrluis avatar May 16 '24 15:05 ndrluis

Sorry, I double-checked the Java implementation, and it's correct on the Python side.

@apersilva, for your case, I believe you need to do something like this:

from pyiceberg.io.pyarrow import schema_to_pyarrow

schema = schema_to_pyarrow(tabela.schema())

df = pa.Table.from_pylist(
    [
        {
            "nome_tabela": table_name,
            "nome_base_dados": database_name,
            "particao": partition_by,
            "numero_registro": size,
            "process_date": process_date,
            "particao_customizada": custom_partition,
            "data_criacao": datetime.now().date()
        }
    ],
    schema=schema
)

tabela.append(df)

In a future release, there will be a function in the Schema object to return the Arrow schema, so it would look like this: schema = tabela.schema().as_arrow()

ndrluis avatar May 16 '24 20:05 ndrluis

ItΒ΄s work, thanks a lot.

apersilva avatar May 16 '24 21:05 apersilva

@apersilva looks like your issue is resolved, can we close this issue?

kevinjqliu avatar Jun 19 '24 16:06 kevinjqliu