SNOW-1331404: fetch_pandas_all converts arrays into strings
Python version
Python 3.11.7 (main, Mar 12 2024, 09:54:34) [GCC 13.2.0]
Operating system and processor architecture
Linux-6.5.0-27-generic-x86_64-with-glibc2.38
Installed packages
annotated-types==0.6.0
anyio==4.3.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asn1crypto==1.5.1
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
bleach==6.1.0
botocore==1.34.81
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.4
comm==0.2.1
contourpy==1.2.0
cryptography==42.0.5
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
docutils==0.16
entrypoints==0.4
executing==2.0.1
fastapi==0.110.1
fastjsonschema==2.19.1
filelock==3.13.1
flake8==4.0.1
flake8-quotes==3.4.0
flakeheaven==3.3.0
fonttools==4.49.0
fqdn==1.5.1
h11==0.14.0
httpcore==1.0.4
httpx==0.27.0
idna==3.6
iniconfig==2.0.0
ipykernel==6.29.3
ipython==8.22.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
json5==0.9.22
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter-events==0.9.0
jupyter-lsp==2.2.4
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.13.0
jupyter_server_terminals==0.5.2
jupyterlab==4.1.2
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.4
kiwisolver==1.4.5
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mccabe==0.6.1
mistune==3.0.2
msal==1.28.0
nbclient==0.9.0
nbconvert==7.16.2
nbformat==5.9.2
nest-asyncio==1.6.0
notebook_shim==0.2.4
numpy==1.26.4
overrides==7.7.0
packaging==24.0
pandas==2.2.0
pandocfilters==1.5.1
parso==0.8.3
pexpect==4.9.0
pillow==10.2.0
platformdirs==3.11.0
pluggy==1.4.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pyasn1==0.6.0
pycodestyle==2.8.0
pycparser==2.21
pydantic==2.6.3
pydantic_core==2.16.3
pyflakes==2.4.0
Pygments==2.17.2
PyJWT==2.8.0
pyOpenSSL==24.1.0
pyparsing==3.1.2
pytest==8.0.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
referencing==0.33.0
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.0
rsa==4.7.2
s3transfer==0.10.1
scikit-learn==1.4.1.post1
scipy==1.12.0
seaborn==0.13.2
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.1
snowflake-connector-python==3.8.1
sortedcontainers==2.4.0
soupsieve==2.5
stack-data==0.6.3
starlette==0.37.2
terminado==0.18.0
threadpoolctl==3.3.0
tinycss2==1.2.1
toml==0.10.2
tomlkit==0.12.4
tornado==6.4
traitlets==5.14.1
types-python-dateutil==2.8.19.20240311
typing_extensions==4.10.0
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.1
uvicorn==0.28.0
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
What did you do?
Fetching a column of type array turns the arrays into strings in the dataframe, which makes them difficult to parse.
>>> query = 'select array_construct(10, 20, 30) as col'
>>> df = cursor.execute(query).fetch_pandas_all()
>>> df
COL
0 [\n 10,\n 20,\n 30\n]
>>> type(df['COL'].iloc[0])
str
What did you expect to see?
I would expect arrays to be represented as a list or tuple of values in the dataframe cell.
Can you set logging to DEBUG and collect the logs?
import logging
import os
for logger_name in ('snowflake.connector',):
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
Hello @lostkamp ,
Thanks for raising the issue, we are checking it, will update.
Regards, Sujan
Hello @lostkamp ,
This is currently the expected behavior. Adding support for returning structured data is currently being worked on. Will update further.
Regards, Sujan
this should be now resolved i suppose, now that v3.10.0 is released
Hi @sfc-gh-sghosh and @sfc-gh-dszmolka ,
thanks for the update. Unfortunately I am getting exactly the same behavior as before when using v3.10.0.
reopened and we'll look further
Hi, is there any update on this issue?
@lostkamp I believe this is caused by the fact that we don't support unstructured objects. You can however work around this by telling Snowflake the type yourself. Check the following example out:
SELECT
SYSTEM$TYPEOF(
array_construct(10, 20, 30)::ARRAY(NUMBER)
) AS structured_array,
SYSTEM$TYPEOF(
array_construct(10, 20, 30)
) AS semi_structured_array;
produces:
| STRUCTURED_ARRAY | SEMI_STRUCTURED_ARRAY |
|---|---|
| ARRAY(NUMBER(38,0))[LOB] | ARRAY[LOB] |
So your code could work this way:
>>> query = 'select array_construct(10, 20, 30) :: array(number) as col'
>>> df = cursor.execute(query).fetch_pandas_all()
>>> df
COL
0 [10, 20, 30]
>>> type(df['COL'].iloc[0])
<class 'numpy.ndarray'>
@sfc-gh-mkeller thank you for the explanation!
However your code still gives me strings:
>>> query = 'select array_construct(10, 20, 30)::array(number) as col'
>>> df = cursor.execute(query).fetch_pandas_all()
>>> df
COL
0 [\n 10,\n 20,\n 30\n]
>>> type(df['COL'].iloc[0])
<class 'str'>
using v3.12.3
Unfortunately @lostkamp as it turns out the code works, but it requires some Snowflake features that are not released yet. One caveat that I want pass along is that as things are now this only works with Iceberg tables.
Let's keep this ticket open to track this feature request!
Ok, thank you.