astro-sdk icon indicating copy to clipboard operation
astro-sdk copied to clipboard

Users should be able to set `tmp_schema` for each connection

Open pgzmnk opened this issue 2 years ago • 5 comments

Please describe the feature you'd like to see Currently tmp_schema must exist on the database to which a SQL tasks will write output to. Feedback sessions with Astro Build surfaced that requiring a specific schema with write access to exist might be an unreasonable ask for a db admin.

There's a feature to set the temp schema with the AIRFLOW__ASTRO__SQL_SCHEMA environment variable. However, forcing the same temp schema name for all conns is untenable. For example, users might want conn1 to use the tmp_schema and conn2 to use a user defined schema named default_schema.

Describe the solution you'd like A possible solution could be to let users specify the temp schema for each conn in a new field in 'extras' of the conn URI.

This means that SQL Tasks write temp tables to the schema defined by the e.g. tmp_schema attribute in the connection's 'extras' section, if that attribute exists. Otherwise write to the default temp schema.

Additional context This issue was encountered within the Astro Build UI.

Acceptance Criteria

  • [ ] Users are able to define a schema for each Task
  • [ ] Tasks execute successfully
  • [ ] Temporary tables found in corresponding schemas/db

pgzmnk avatar Apr 11 '22 20:04 pgzmnk

That's excellent feedback, @pgzmnk!

A PR in progress (#140) addresses the same issue but uses a different approach. It allows users to define default values per DAG instead of relying on Astro 0.7.x temporary schema environment variable. What are your thoughts about this alternative approach?

I like your proposal because it is focused on the database, as opposed to DAGs - it is more flexible and would allow a single DAG powered by Astro to interact with >1 database, which can have different temporary schemas.

tatiana avatar Apr 11 '22 21:04 tatiana

This makes sense. Let's see what user feedback we get based on your proposed implementation.

Thank you for being on top of this.

pgzmnk avatar Apr 14 '22 16:04 pgzmnk

@pgzmnk, One of the challenges in implementing the feature you describe, is that Postgres expects the connection extra property only to contain pre-defined Postgres connection fields:

  • https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/connections/postgres.html#howto-connection-postgres
  • https://github.com/apache/airflow/blob/59c450ee5425a2d23ef813dbf219cde14df7c85c/airflow/providers/postgres/hooks/postgres.py#L103-L111

So when we give it a connection that has an extra containing another field (e.g. tmp_schema), the hook assumes it should be used to establish the connection to Postgres (e.g., as a TTS parameter) and fails to connect.

We'll look into alternatives to make this work!

tatiana avatar May 24 '22 11:05 tatiana

For those who may be interested, the issues that happened with Postgres when trying to use this approach are documented in the branch temp-schema-from-conn: https://github.com/astronomer/astro-sdk/commit/8b0dfecb41a036bd849c185ec49377c7938cb5db

tatiana avatar May 24 '22 13:05 tatiana

@feluelle @sunank200 Any updates on this issue.

This was brought up again internally at https://astronomer.slack.com/archives/C02B8SPT93K/p1663278078864369

kaxil avatar Sep 15 '22 23:09 kaxil