ibis icon indicating copy to clipboard operation
ibis copied to clipboard

bug: [Athena] Missing support for CREATE TABLE AS SELECT (CTAS) or expression materialization in Athena backend

Open uday-dasari opened this issue 6 months ago • 1 comments

What happened?

Summary

The Athena backend in Ibis currently lacks support for materializing Ibis expressions into Athena tables via CREATE TABLE AS SELECT (CTAS), which makes it incompatible with downstream tools like Kedro that rely on Ibis’s Table.save() API.

Context

While the ibis create_table() method exists for Athena backend, it appears to only support defining an empty external table using a schema and location — it does not support saving expressions or CTAS operations.

This limits the backend's ability to support materialization workflows, which are common in ETL pipelines.

Use Case

In a Kedro pipeline, I'm trying to read from a source Athena table and save the transformed result into a new Athena table using Ibis and kedro_datasets.ibis.TableDataset.

Example:

python

from kedro_datasets.ibis import TableDataset

load_dataset = TableDataset(
    table_name= src_table_name,
    connection={
        "backend": "athena",
        "s3_staging_dir": "s3://bucket-name/...1",
        "schema_name": "schema_name",
    },
)
data = load_dataset.load()

save_dataset = TableDataset(
    table_name="table_name",
    connection={
        "backend": "athena",
        "s3_staging_dir": "s3://bucket-name/...",
        "schema_name": "schema_name",
    },
    save_args={"materialized": "table", "overwrite": None} 

save_dataset.save(data)

Problem:

This fails with: DatasetError: Failed while saving data to dataset TableDataset(...). Under the hood, Kedro + Ibis appears to call create_table(..., obj=<IbisExpr>) — but the Athena backend does not implement CTAS or insert operations, leading to failure after writing intermediate data to S3.

Expected Behavior

The Athena backend should ideally support:

  • Materializing an Ibis expression as a table using CREATE TABLE AS SELECT (CTAS)
  • Possibly also support insert() or overwrite() operations.

Why It Matters

This feature would enable:

  • Full Kedro + Ibis pipelines using Athena
  • Avoid fallback to PyAthena or raw DDL/pyarrow + s3fs for intermediate steps

What version of ibis are you using?

ibis-framework - 10.5.0 kedro-datasets - 7.0.0

What backend(s) are you using, if any?

Athena

Relevant log output


Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

uday-dasari avatar May 28 '25 09:05 uday-dasari

@uday-dasari Do you have a traceback or something else indicating that CTAS doesn't work? We definitely support this use case, and many of our tests hit the code path for create_table with an existing Ibis expression.

cpcloud avatar Jun 02 '25 17:06 cpcloud