blazingsql
blazingsql copied to clipboard
[BUG] Error when reading table with hive cursor that does not happen with hdfs
Describe the bug I get the following error when creating a table with a pyhive cursor:
from blazingsql import BlazingContext
from pyhive import hive
cursor = hive.Connection(
host="{hive_edge_node_url}",
username = getuser(),
auth='KERBEROS',
kerberos_service_name="hive",
configuration = {'hive.execution.engine': "tez", 'tez.queue.name': "group1"}
).cursror()
bc = BlazingContext()
bc.create_table('bliblu',
cursor,
hive_table_name = 'transuk2m2019_mini',
hive_database_name = 'chavesrl')
Error:
ERROR: Could not get partition values for file: hdfs://anahnn/visa/user/chavesrl/chavesrl.db/transuk2m2019_mini/000000_0
ERROR: Could not get partition values for file: hdfs://anahnn/visa/user/chavesrl/chavesrl.db/transuk2m2019_mini/000001_0
ERROR: Could not get partition values for file: hdfs://anahnn/visa/user/chavesrl/chavesrl.db/transuk2m2019_mini/000002_0
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-edf72ca4fd46> in <module>
4 hive_table_name = 'transuk2m2019_mini',
5 hive_database_name = 'chavesrl',
----> 6 file_format = 'parquet'
7 )
/projects/gds/chavesrl/condapv/envs/visaverse-gpu/lib/python3.7/site-packages/pyblazing/apiv2/context.py in create_table(self, table_name, input, **kwargs)
2458 ):
2459 parsedMetadata = self._parseMetadata(
-> 2460 file_format_hint, table.slices, parsedSchema, kwargs
2461 )
2462
/projects/gds/chavesrl/condapv/envs/visaverse-gpu/lib/python3.7/site-packages/pyblazing/apiv2/context.py in _parseMetadata(self, file_format_hint, currentTableNodes, schema, kwargs)
2714 schema["names"] = [i.encode() for i in schema["names"]]
2715 if "names" in kwargs:
-> 2716 kwargs["names"] = [i.encode() for i in kwargs["names"]]
2717
2718 if self.dask_client:
/projects/gds/chavesrl/condapv/envs/visaverse-gpu/lib/python3.7/site-packages/pyblazing/apiv2/context.py in <listcomp>(.0)
2714 schema["names"] = [i.encode() for i in schema["names"]]
2715 if "names" in kwargs:
-> 2716 kwargs["names"] = [i.encode() for i in kwargs["names"]]
2717
2718 if self.dask_client:
AttributeError: 'bytes' object has no attribute 'encode'
The table I am trying to read is parquet
but specifying that does not helo either, the problem I've found enabling the debugger is that i.encode()
is trying to encode i
which is already a byte-string.
Expected behavior
Column names being read properly. maybe pyblazing
detecting the strings are already encoded
Environment overview (please complete the following information)
- Environment location: Bare metal
- Method of BlazingSQL install: conda
-
BlazingSQL Version which can be obtained by doing as follows:
import blazingsql print(blazingsql.__info__())
BlazingSQL version (git hash): ff4ece0366a4d76bf533baeb03dd03bdfc5232be
BlazingSQL branch name: HEAD
BlazingSQL branch tag: v0.19.0
BlazingSQL build id: 0
BlazingSQL compiler version: GNU /usr/bin/c++ 7.5.0
BlazingSQL cuda flags: -Xcompiler -Wno-parentheses -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 --expt-extended-lambda --expt-relaxed-constexpr -Werror=cross-execution-space-call -Xcompiler -Wall,-Wno-error=deprecated-declarations --default-stream=per-thread -DHT_DEFAULT_ALLOCATOR
BlazingSQL Operating system kernel: Linux-5.4.0-1038-aws
BlazingSQL Operating system architecture: x86_64
BlazingSQL Linux Operating system release: NAME=Ubuntu|VERSION=16.04.7 LTS (Xenial Xerus)|ID=ubuntu|ID_LIKE=debian|PRETTY_NAME=Ubuntu 16.04.7 LTS|VERSION_ID=16.04|HOME_URL=http://www.ubuntu.com/|SUPPORT_URL=http://help.ubuntu.com/|BUG_REPORT_URL=http://bugs.launchpad.net/ubuntu/|VERSION_CODENAME=xenial|UBUNTU_CODENAME=xenial
None
Environment details
Please run and paste the output of the print_env.sh
script here, to gather any other relevant environment details
Additional context Add any other context about the problem here.
----For BlazingSQL Developers---- Suspected source of the issue Where and what are potential sources of the issue
Other design considerations What components of the engine could be affected by this?