snowflake-connector-python
snowflake-connector-python copied to clipboard
SNOW-295953: In CREATE TABLE AS, cursor.rowcount should return number of rows created
What is the current behavior?
When the connector executes a CREATE TABLE ... AS SELECT
or CREATE VIEW ... AS SELECT
statement, the cursor.rowcount
returns 1
. Instead, I believe it should return the number of rows created (for tables) or -1
(for views).
I believe this to be the spirit of what rowcount
represents:
https://github.com/snowflakedb/snowflake-connector-python/blob/77ad07f9a07991719258d6b6c6ebfb48691e51de/src/snowflake/connector/cursor.py#L92
While I'm not positive, I believe the current behavior (returning 1
for these query types) reflects the fact that a single "row of data" is returned, i.e. the status message. Here's cursor.fetchall()
in response to both statement types:
[('Table TABLE_MODEL successfully created.',)]
[('Table VIEW_MODEL successfully created.',)]
What is the desired behavior?
I would like the Snowflake connector's cursor to behave more like the cursor in psycopg2
, the standard python connector for Postgres. In psycopg2
(docs), the rowcount
returned by CREATE VIEW
, CREATE VIEW AS
, and CREATE TABLE
is always -1
; whereas CREATE TABLE AS
, treated as DML, returns the number of rows modified (i.e. created).
>>> import psycopg2
>>> con = psycopg2.connect(...)
>>> cur = con.cursor()
>>> cur.execute("create table my_table as (select 1 as id union all select 2 union all select 3)")
>>> cur.statusmessage
'SELECT 3'
>>> cur.rowcount
3
>>> import snowflake.connector
>>> handle = snowflake.connector.connect(...)
>>> cur = handle.cursor()
>>> cur.execute("create table my_table as (select 1 as id union all select 2 union all select 3)")
<snowflake.connector.cursor.SnowflakeCursor object at 0x10aba7eb0>
>>> cur.rowcount
1
>>> cur.fetchall()
[('Table MY_TABLE successfully created.',)]
How would this improve snowflake-connector-python
?
Most directly, this would improve the connector by aligning it with the standard set by established connectors/cursors.
On a deeper level, information about the number of rows stored in a new table is tremendously useful for everyone, and strictly necessary for organizations that (for regulatory/compliance reasons) need rigorous auditing of data movement.
I'm one of the maintainers of dbt. Our models make frequent use of create or replace table as
on Snowflake—a powerful feature for atomic and idempotent data transformation. When dbt materializes a model in the databases, it captures a database-specific AdapterResponse
object, including a number of rows_affected
, and writes those results to a metadata artifact (run_results.json
, docs) that can be used for auditing, runtime analysis, etc. Here's the code for Snowflake, which directly relies on the python connector's cursor.rowcount
.
I opened a related issue earlier this week (https://github.com/fishtown-analytics/dbt/issues/3142), responding to a request from a joint dbt + Snowflake customer. I'm still clarifying the exact requirements, but I believe that storing record counts for dbt-produced tables in dbt-produced artifacts would be simpler and more reliable if the Snowflake connector's cursor returned the number of rows modified (created) by CTA statements.
I'm happy to hear disagreement, or that there are good reason for preserving the current behavior. Thanks folks!
Hi @jtcohen6 According to https://www.python.org/dev/peps/pep-0249/#rowcount it should be different for DML and for DQL statements. I'm in favor of doing this, but implementing this might not be simple, as it might need help from the back-end.
Hi! Has there been any movement on this issue since last year?
Just checking a few months later if this is being considered by the team or not.
I'd like someone@snowflake to escalate this as it is clearly a bug. And as @sfc-gh-mkeller commented, it really is not in compliance with PEP-249.
Hey folks - we will look into addressing this. Thanks
Hi! Coming back again after a few months. Is there any update?
I also have a user asking about this for CTAS
I had a potential Snowflake user ask me for this today, with dbt create table as statements.
Also have a user asking about this.
Another customer asking about this.
hi folks thank you so much for all the feedback and especially for bearing with us! Syncing internally with the team to see if we could get this one put on the roadmap somehow, will keep this thread posted.
create table as select ...
is a DDL. It makes sense that it doesn't return the number of rows from the select
clause of the statement if we're talking about DBAPI2.0 spec.
I understand that it's useful for some scenarios. But it will be a breaking change to the current customers if we change it.
Do you think it's a good idea to add another property on the cursor to indicate how many rows are affected from CTAS? I can't promise at this time though because this will need the CTAS
SQL to return the extra "rows affected".
Aside from CTAS, Snowflake has other similar SQLs like create table clone
. I guess you also expect them to behave similarly?
My interpretation is the same as the Amazon Athena one, create table as
is both a DDL
and a DML
statement combined.
On the dbt side, if it is in a field different from rowcount
, we will still be able to get the info.
Having the info for create table clone
could be useful but, again, from the dbt and Snowflake integration side, I don't see it as important as for ctas
.