snowflake-connector-python icon indicating copy to clipboard operation
snowflake-connector-python copied to clipboard

SNOW-295953: In CREATE TABLE AS, cursor.rowcount should return number of rows created

Open jtcohen6 opened this issue 3 years ago • 13 comments

What is the current behavior?

When the connector executes a CREATE TABLE ... AS SELECT or CREATE VIEW ... AS SELECT statement, the cursor.rowcount returns 1. Instead, I believe it should return the number of rows created (for tables) or -1 (for views).

I believe this to be the spirit of what rowcount represents:

https://github.com/snowflakedb/snowflake-connector-python/blob/77ad07f9a07991719258d6b6c6ebfb48691e51de/src/snowflake/connector/cursor.py#L92

While I'm not positive, I believe the current behavior (returning 1 for these query types) reflects the fact that a single "row of data" is returned, i.e. the status message. Here's cursor.fetchall() in response to both statement types:

[('Table TABLE_MODEL successfully created.',)]
[('Table VIEW_MODEL successfully created.',)]

What is the desired behavior?

I would like the Snowflake connector's cursor to behave more like the cursor in psycopg2, the standard python connector for Postgres. In psycopg2 (docs), the rowcount returned by CREATE VIEW, CREATE VIEW AS, and CREATE TABLE is always -1; whereas CREATE TABLE AS, treated as DML, returns the number of rows modified (i.e. created).

>>> import psycopg2
>>> con = psycopg2.connect(...)
>>> cur = con.cursor()
>>> cur.execute("create table my_table as (select 1 as id union all select 2 union all select 3)")
>>> cur.statusmessage
'SELECT 3'
>>> cur.rowcount
3
>>> import snowflake.connector
>>> handle = snowflake.connector.connect(...)
>>> cur = handle.cursor()
>>> cur.execute("create table my_table as (select 1 as id union all select 2 union all select 3)")
<snowflake.connector.cursor.SnowflakeCursor object at 0x10aba7eb0>
>>> cur.rowcount
1
>>> cur.fetchall()
[('Table MY_TABLE successfully created.',)]

How would this improve snowflake-connector-python?

Most directly, this would improve the connector by aligning it with the standard set by established connectors/cursors.

On a deeper level, information about the number of rows stored in a new table is tremendously useful for everyone, and strictly necessary for organizations that (for regulatory/compliance reasons) need rigorous auditing of data movement.

I'm one of the maintainers of dbt. Our models make frequent use of create or replace table as on Snowflake—a powerful feature for atomic and idempotent data transformation. When dbt materializes a model in the databases, it captures a database-specific AdapterResponse object, including a number of rows_affected, and writes those results to a metadata artifact (run_results.json, docs) that can be used for auditing, runtime analysis, etc. Here's the code for Snowflake, which directly relies on the python connector's cursor.rowcount.

I opened a related issue earlier this week (https://github.com/fishtown-analytics/dbt/issues/3142), responding to a request from a joint dbt + Snowflake customer. I'm still clarifying the exact requirements, but I believe that storing record counts for dbt-produced tables in dbt-produced artifacts would be simpler and more reliable if the Snowflake connector's cursor returned the number of rows modified (created) by CTA statements.

I'm happy to hear disagreement, or that there are good reason for preserving the current behavior. Thanks folks!

jtcohen6 avatar Mar 05 '21 13:03 jtcohen6

Hi @jtcohen6 According to https://www.python.org/dev/peps/pep-0249/#rowcount it should be different for DML and for DQL statements. I'm in favor of doing this, but implementing this might not be simple, as it might need help from the back-end.

sfc-gh-mkeller avatar Mar 15 '21 21:03 sfc-gh-mkeller

Hi! Has there been any movement on this issue since last year?

b-per avatar May 04 '22 08:05 b-per

Just checking a few months later if this is being considered by the team or not.

b-per avatar Dec 21 '22 12:12 b-per

I'd like someone@snowflake to escalate this as it is clearly a bug. And as @sfc-gh-mkeller commented, it really is not in compliance with PEP-249.

easyas314 avatar Jan 11 '23 17:01 easyas314

Hey folks - we will look into addressing this. Thanks

sfc-gh-achandrasekaran avatar Feb 09 '23 00:02 sfc-gh-achandrasekaran

Hi! Coming back again after a few months. Is there any update?

b-per avatar May 16 '23 12:05 b-per

I also have a user asking about this for CTAS

ernestoongaro avatar Aug 29 '23 09:08 ernestoongaro

I had a potential Snowflake user ask me for this today, with dbt create table as statements.

boxysean avatar Nov 21 '23 11:11 boxysean

Also have a user asking about this.

dbeatty10 avatar Feb 02 '24 03:02 dbeatty10

Another customer asking about this.

sdurry avatar Mar 25 '24 10:03 sdurry

hi folks thank you so much for all the feedback and especially for bearing with us! Syncing internally with the team to see if we could get this one put on the roadmap somehow, will keep this thread posted.

sfc-gh-dszmolka avatar Mar 25 '24 10:03 sfc-gh-dszmolka

create table as select ... is a DDL. It makes sense that it doesn't return the number of rows from the select clause of the statement if we're talking about DBAPI2.0 spec.

I understand that it's useful for some scenarios. But it will be a breaking change to the current customers if we change it.

Do you think it's a good idea to add another property on the cursor to indicate how many rows are affected from CTAS? I can't promise at this time though because this will need the CTAS SQL to return the extra "rows affected".

Aside from CTAS, Snowflake has other similar SQLs like create table clone. I guess you also expect them to behave similarly?

sfc-gh-yixie avatar Mar 26 '24 17:03 sfc-gh-yixie

My interpretation is the same as the Amazon Athena one, create table as is both a DDL and a DML statement combined.

On the dbt side, if it is in a field different from rowcount, we will still be able to get the info.

Having the info for create table clone could be useful but, again, from the dbt and Snowflake integration side, I don't see it as important as for ctas.

b-per avatar Mar 26 '24 18:03 b-per