wren-engine icon indicating copy to clipboard operation
wren-engine copied to clipboard

feat(ibis): Show full Redshift table metadata by querying system catalogs

Open douenergy opened this issue 2 months ago • 3 comments

The current Redshift metadata query uses information_schema.tables and information_schema.columns, which only shows schemas that are in the user's search_path or owned by the user. This causes visibility issues like https://github.com/Canner/WrenAI/issues/1953

Summary by CodeRabbit

  • Bug Fixes

    • More accurate Redshift metadata: corrected nullability and data type reporting; improved table/column comments.
    • Excludes system/internal schemas and dropped/system columns; respects namespace privileges.
    • Consistent ordering by schema, table, then column; better handling of views alongside tables.
  • Performance

    • Faster and more reliable retrieval of Redshift table and column metadata.

douenergy avatar Oct 08 '25 06:10 douenergy

Walkthrough

The Redshift metadata query in get_table_list was replaced: it now queries PostgreSQL catalog tables (pg_class/pg_namespace/pg_attribute) and uses format_type/current_database() for types and catalogs, with revised filters, nullability logic, and ordering. External interface and Table/Column construction remain unchanged.

Changes

Cohort / File(s) Summary of Changes
Redshift metadata query rewrite
ibis-server/app/model/metadata/redshift.py
Replaced information_schema-based SQL with a catalog-driven query against pg_class/pg_namespace/pg_attribute; selected fields now use current_database()/nspname/relname/attname/format_type; nullability moved to a CASE on attnotnull; filters restrict relkind to 'r' and 'v', exclude system schemas/columns, exclude dropped columns, require USAGE on namespace, and order by schema, table, column position; existing helper transforms and Table/Column construction preserved.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Metadata as RedshiftMetadata
  participant Conn as SQLConnector
  participant Redshift as Redshift

  Client->>Metadata: get_table_list()
  Metadata->>Conn: execute(catalog-based SQL)
  Conn->>Redshift: SELECT from pg_namespace/pg_class/pg_attribute
  Redshift-->>Conn: rows (catalog, schema, table, column, type, nullable, comment)
  Conn-->>Metadata: result set
  Note right of Metadata: Map rows → Table/Column objects<br/>Use helpers: _format_redshift_compact_table_name, _transform_redshift_column_type
  Metadata-->>Client: list[Table]

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Pay attention to the new SQL correctness and edge cases:
    • type extraction via format_type(...) and its parsing in _transform_redshift_column_type
    • nullability logic change from information_schema to attnotnull CASE expression
    • privilege check (has_schema_privilege(nsp.oid, 'USAGE')) and schema exclusions
    • handling of system/dropped columns and ordering semantics

Poem

I thump the logs and sniff each name,
From pg_class I learn the game.
Columns, types, and schemas bright—
I hop through catalogs all night.
A carrot cheer for metadata’s light. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
âś… Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check âś… Passed The title accurately describes the main change: querying Redshift system catalogs instead of information_schema to show full table metadata, which directly addresses the PR's core objective.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
đź§Ş Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 08 '25 06:10 coderabbitai[bot]

@douenergy, is this PR tested well?

goldmedal avatar Oct 29 '25 06:10 goldmedal

Oracle testing failures are not related to this PR. I have tested it manually on my local environment.

image

douenergy avatar Nov 19 '25 05:11 douenergy