PyHive icon indicating copy to clipboard operation
PyHive copied to clipboard

Missing explanation why code raises 'Writing to hive not supported'

Open kotofos opened this issue 4 years ago • 2 comments

There is code in pyhive/sqlachemy class which suppose to prevent writing to hive. But there is no other explanation in docs, git log or whatsoever why writing is not supported. The writing is works for raw queries. It also works if this part is commented out Same for sqlalchemy_hive.HiveStringTypeBase which omits implementation of python_type

see https://github.com/dropbox/PyHive/blob/b21c507a24ed2f2b0cf15b0b6abb1c43f31d3ee0/pyhive/sqlalchemy_hive.py#L37

kotofos avatar Aug 16 '21 09:08 kotofos

I was facing the same error this morning while working with sqlalchemy and sqlalchemy-databricks, there isn't any information about it out there so I best share my solution!

The cause of the issue seems to be that I was using Table reflection to infer the structure/metadata of my Databricks table. In doing this, SQLAlchemy set some column types to Hive specific types rather than something more generic. The solution then was to instead manually define the Table columns within a Mapped Class.

Afterwards, I was able to insert using the ORM without this Hive error popping up.

Dahmon avatar Jun 21 '22 02:06 Dahmon

As I had too large of a schema, the @Dahmon solution wasn't an option, although it directed me to a correct solution.

The solution is to cast automap column types to dialect-agnostic types (not every type breaks it but for example timestamp does). Solution provided on sqlalchemy docs is to:

@event.listens_for(metadata, "column_reflect")
def genericize_datatypes(inspector, tablename, column_dict):
    column_dict["type"] = column_dict["type"].as_generic()

But there is no cast provided for pyhive, so i did:

def hive_to_generic(t):
    if type(t) is hive.HiveTimestamp:
           return sql.DateTime
    elif type(t) is hive.HiveDecimal:
           return sql.Float
    builtins = [sql.String, sql.BigInteger, sql.Integer]       
    if type(t) not in builtins:
        print(f"WARNING: no generic conversion - not implemented for {type(t)}")
    return t
    
@event.listens_for(metadata, "column_reflect")
def genericize_datatypes(inspector, tablename, column_dict):
    column_dict["type"] = hive_to_generic(column_dict["type"])

Change hive_to_generic() to your needs.

Then there is a problem (at least in my case) that we need to pass iso datetimes to pyhive (databricks) so I just cast my payload's values before using them:

def _to_hive_datetime(key, value, typeDict):
    if value is None:
        return None
    if type(typeDict[key]) is not sql.DateTime:
        return value
    try: 
        return parse(value).isoformat()
    except ValueError:
        return value

I'm passing key and typeDict to check for false dates, for example digit strings.

SCube19 avatar Jun 02 '23 12:06 SCube19