PyTd
PyTd copied to clipboard
Unexpected Unicode Conversion Error
I've been plagued by hard-to-predict Unicode Conversion Errors
while trying to insert non-Unicode data into existing TD tables.
The data being inserted is 2 integer columns and 20 floats, stored in a list of lists, passed as a parameter to cursor.executemany(insertString,data,batch=True)
The session explicitly specifies charSet = 'ASCII'
(though that is also the ODBC default).
The table being inserted into also has a TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6)
column that isn't being inserted from python.
The insertString looks like:
INSERT INTO db_name.table_name(comma_dlm_list_of_columns_to_insert) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
I get back:
teradata.api.DatabaseError: (1, '[HY000] [Teradata][ODBC Teradata Driver][Teradata Database] Unicode Conversion Error')
Is the problem intermittent? When it fails can you print out the input to the function and isolate the row its failing for?
The problem is intermittent and I have a hard time reproducing it. So far I've noticed it only when the input I'm trying to insert contains floats. It also seems like it's happening only when I'm inserting into a table that I didn't create using Python.
I can see that my ODBC drivers have been downgraded to TD 14.10 though the db is on 15.10, so that seems like a possible culprit. Is that a known issue?
Okay confirmed that the TD ODBC drivers are still 15.10. It seems to reliably work when I create a table and immediately insert into it, but when I insert into existing tables, I sometimes get this error. Again it seems to only happen when I'm inserting floats (though it's possible that's a coincidence because I can't reliably recreate the error on new test tables).
It appears to be an issue relating to the size of the input to .executemany(). When I insert line-by-line it succeeds, as it does when I insert 5000 rows at at time. When I insert ~7500 (a number I get at using a very rough heuristic), I get the error on the first insert. This surprises me because I have certainly had the 'too many rows at once' error in multiple forms:
- in the case where it's above the explicit maximum (16383 rows I believe)
- in the SQL request exceeds 1MB case
Any advice or thoughts on this? I'm hesitant to catch that exception, as I assume it's the same exception thrown in the case of an actual Unicode Conversion Error...
That's great you can reproduce it at will now. If you have a simple script that produces the problem I can see if I can reproduce the problem in my environment and generate an ODBC trace file to send to the ODBC support team to have a look.
I'm having trouble recreating it with arbitrarily generated data. With data on which it's already failed, I can reliably recreate the issue (including on new tables with the same definition). Also perhaps worth noting, there's an 'insert_dt' field that is defined as TIMESTAMP(6) FORMAT 'YYYY-MM-DDBHH:MI:SSBT' NOT NULL DEFAULT CURRENT_TIMESTAMP(6))
The behavior is baffling:
- if I insert the first 6000 rows, I get a Unicode Conversion Error
- if I insert the first 5000 rows, I get no error
- any insert of size < 6000 rows works)
- if I insert rows 6000-12000, I get no error
- if I insert rows 100-5100, I get a Memory Allocation Error
The data is a list of lists looking like the following:
- an int (in the PI) varying from 1 to len(data)
- 20 floats ranging from [0.05 to 2000.) with None in any places that are null
- another int (in the PI) that is the same across all data I'm inserting
When I recreate new, random data meeting those specifications, I can't get the same error. I've inspected everything I can about those first 6000 rows and I can't identify anything that occurs in them that doesn't elsewhere. They're all a mixture of int/float/None, there are no values outside the range 0.05-2000., and I can insert every row individually.
Can you try reproducing the issue with ODBC trace enabled and e-mail the file to me?
emailed.
Any progress in solving this issues? We are encountering the same problem feeding many records to executemany
I ended up lumping it in with this error https://github.com/Teradata/PyTd/issues/76 and catching either of these Exceptions. My best solution was to wrap my inserts with a try: and in the except: block it retries with fewer rows at a time.
The only hard and fast rules I could figure out were
- never insert > 16,383 (I think) rows at once
- never insert > 1 MB of bound data per insert.