PyTd icon indicating copy to clipboard operation
PyTd copied to clipboard

Unexpected Unicode Conversion Error

Open jamesmf opened this issue 7 years ago • 10 comments

I've been plagued by hard-to-predict Unicode Conversion Errors while trying to insert non-Unicode data into existing TD tables.

The data being inserted is 2 integer columns and 20 floats, stored in a list of lists, passed as a parameter to cursor.executemany(insertString,data,batch=True)

The session explicitly specifies charSet = 'ASCII' (though that is also the ODBC default).

The table being inserted into also has a TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) column that isn't being inserted from python.

The insertString looks like: INSERT INTO db_name.table_name(comma_dlm_list_of_columns_to_insert) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

I get back: teradata.api.DatabaseError: (1, '[HY000] [Teradata][ODBC Teradata Driver][Teradata Database] Unicode Conversion Error')

jamesmf avatar Jun 02 '17 18:06 jamesmf

Is the problem intermittent? When it fails can you print out the input to the function and isolate the row its failing for?

escheie avatar Jun 02 '17 20:06 escheie

The problem is intermittent and I have a hard time reproducing it. So far I've noticed it only when the input I'm trying to insert contains floats. It also seems like it's happening only when I'm inserting into a table that I didn't create using Python.

I can see that my ODBC drivers have been downgraded to TD 14.10 though the db is on 15.10, so that seems like a possible culprit. Is that a known issue?

jamesmf avatar Jun 03 '17 12:06 jamesmf

Okay confirmed that the TD ODBC drivers are still 15.10. It seems to reliably work when I create a table and immediately insert into it, but when I insert into existing tables, I sometimes get this error. Again it seems to only happen when I'm inserting floats (though it's possible that's a coincidence because I can't reliably recreate the error on new test tables).

jamesmf avatar Jun 05 '17 13:06 jamesmf

It appears to be an issue relating to the size of the input to .executemany(). When I insert line-by-line it succeeds, as it does when I insert 5000 rows at at time. When I insert ~7500 (a number I get at using a very rough heuristic), I get the error on the first insert. This surprises me because I have certainly had the 'too many rows at once' error in multiple forms:

  1. in the case where it's above the explicit maximum (16383 rows I believe)
  2. in the SQL request exceeds 1MB case

Any advice or thoughts on this? I'm hesitant to catch that exception, as I assume it's the same exception thrown in the case of an actual Unicode Conversion Error...

jamesmf avatar Jun 15 '17 19:06 jamesmf

That's great you can reproduce it at will now. If you have a simple script that produces the problem I can see if I can reproduce the problem in my environment and generate an ODBC trace file to send to the ODBC support team to have a look.

escheie avatar Jun 16 '17 04:06 escheie

I'm having trouble recreating it with arbitrarily generated data. With data on which it's already failed, I can reliably recreate the issue (including on new tables with the same definition). Also perhaps worth noting, there's an 'insert_dt' field that is defined as TIMESTAMP(6) FORMAT 'YYYY-MM-DDBHH:MI:SSBT' NOT NULL DEFAULT CURRENT_TIMESTAMP(6))

The behavior is baffling:

  • if I insert the first 6000 rows, I get a Unicode Conversion Error
  • if I insert the first 5000 rows, I get no error
  • any insert of size < 6000 rows works)
  • if I insert rows 6000-12000, I get no error
  • if I insert rows 100-5100, I get a Memory Allocation Error

The data is a list of lists looking like the following:

  • an int (in the PI) varying from 1 to len(data)
  • 20 floats ranging from [0.05 to 2000.) with None in any places that are null
  • another int (in the PI) that is the same across all data I'm inserting

When I recreate new, random data meeting those specifications, I can't get the same error. I've inspected everything I can about those first 6000 rows and I can't identify anything that occurs in them that doesn't elsewhere. They're all a mixture of int/float/None, there are no values outside the range 0.05-2000., and I can insert every row individually.

jamesmf avatar Jun 16 '17 14:06 jamesmf

Can you try reproducing the issue with ODBC trace enabled and e-mail the file to me?

escheie avatar Jun 19 '17 19:06 escheie

emailed.

jamesmf avatar Jun 26 '17 17:06 jamesmf

Any progress in solving this issues? We are encountering the same problem feeding many records to executemany

andreapiso avatar Jul 13 '18 05:07 andreapiso

I ended up lumping it in with this error https://github.com/Teradata/PyTd/issues/76 and catching either of these Exceptions. My best solution was to wrap my inserts with a try: and in the except: block it retries with fewer rows at a time.

The only hard and fast rules I could figure out were

  1. never insert > 16,383 (I think) rows at once
  2. never insert > 1 MB of bound data per insert.

jamesmf avatar Jul 13 '18 05:07 jamesmf