SNOW-823151: First result batch has int64 arrow type for NUMBER column with real values
When fetching multiple batches via get_result_batches , the first batch is empty and has the incorrect arrow data type for NUMBER columns containing real values, eg:
import snowflake.connector
import pyarrow
conn = snowflake.connector.connect(... )
cur = conn.cursor()
cur.execute("SELECT cast(1.5 as NUMBER(32,4)) as AMOUNT;")
batches = cur.get_result_batches()
batch = batches[0]
# the first batch is normally empty, which triggers _create_empty_table
# in this MRE we trigger it explicitly to demonstrate the error
table = batch._create_empty_table()
print(table.schema)
assert table.schema.types[0] == pyarrow.float64()
Will fail with an assertion error and output:
AMOUNT: int64
snowflake-connector-python==3.0.3
@tekumara did this happen with previous versions?
I'm not entirely sure.
hey @tekumara , you are just one step close to get the result you want.
get_result_batches returns an ArrowResultChunk object -- it's a data structure storing data, not the data itself. You can either call the create_iter to create an iterator to iterate the data within the chunk or call to_arrow, to_pandas to convert to pyarrow Table or pandas DataFrame if you have installed the pandas dependency.
please let me know if you have any other questions, I'm more than happy to answer.
I still have the same problem using those methods. The MRE above uses _create_empty_table() to make it easier to reproduce.
can you try this?
conn = snowflake.connector.connect(**CONNECTION_PARAMETERS)
cur = conn.cursor()
cur.execute("SELECT cast(1.5 as NUMBER(32,4)) as AMOUNT;")
batches = cur.get_result_batches()
batch = batches[0]
table = batch.to_arrow()
assert table.schema.types[0] == pyarrow.float64()
That example works but not in the general case ie: when I fetch and use batches[0].to_arrow() from one of our real tables containing a NUMBER(32,4) the first batch is empty. This triggers the _create_empty_table() code path and the fault demonstrated in the contrived MRE above.
@tekumara Did you ever find out why the first batch always returns empty?
No I didn't, are you seeing this too?
No I didn't, are you seeing this too?
Yes I am seeing the same thing, when the connector returns a list of ResultBatch objects, I have found that if it is a large list and rowcounts exceed ~ 2k then the first ResultBatch is empty.
To work around this, when distributing ResultBatch objects across multiple workers I simply check if the ResultBatch is empty and don't process it in the worker.
@sfc-gh-aling could you reopen this issue please as it’s still happening? 🙏
thank you all for your feedbacks on this one! i reopened this issue and we'll take a look