django-pyodbc-azure icon indicating copy to clipboard operation
django-pyodbc-azure copied to clipboard

Arithmetic overflow on bulk insert with None values

Open twschiller opened this issue 8 years ago • 4 comments

I'm getting and Arithmetic overflow error when performing a bulk insert against Azure SQL:

[22003] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Arithmetic overflow error converting nvarchar to data type numeric. (8115) (SQLExecDirectW)

I suspect the ODCB driver is mishandling the bulk_insert of one of the batches, based on the following observations:

  • I can insert all of the the entries individually using model.save()
  • I can insert the entries using objects.bulk_insert(...) if I only pass a singleton list at a time; it fails for some chunks (somewhere between 4-8 records, of 5000 records) if I pass chucks of size 2 or larger
  • Other entries with None for [field2] work are inserted OK, so it's probably not an issue with how ODBC/SQL is handling the None field
  • The operation always fails for chunks containing the same records

Here is the query from Django:

"QUERY = 'INSERT INTO [table] ([field1], [field2], [field3], [field4], [field5], [field6], [field7])
VALUES (%s, %s, %s, %s, %s, %s, %s), (%s, %s, %s, %s, %s, %s, %s)' - PARAMS = (
1, None,  '42147.0510000000', '0.28480000', '0.06274000', '0.06700000', '384932.8700000000', 
1, '759.63000000', '203591.5870000000', '0.18545510', '0.40873770', '0.42296560', '498149.1000000000'
)"

The field definitions in the DB are as follows:

[field1] [int] NOT NULL,
[field2] [numeric](24, 10) NULL,
[field3] [numeric](24, 10) NOT NULL,
[field4] [numeric](18, 8) NOT NULL,
[field5] [numeric](18, 8) NOT NULL,
[field6] [numeric](18, 8) NOT NULL,
[field7] [numeric](24, 10) NOT NULL,

Based on this, there should be enough space in the numeric fields. (In the real query, there are 21 fields. The fields I excluded are FKs.)

I'm working on logging the actual SQL that Azure SQL is retrieving to see if it doesn't match the Django-reported SQL

twschiller avatar Nov 21 '17 20:11 twschiller

The problem seems to be related to None values mixed with numeric values.

Here is the query generated by django-pyodbc-azure for the bulk create. value is a numeric(18, 8) nullable field in the database

INSERT INTO [table_name] ([factory_id], [country_id], [factor_id], [year], [value], [exclude])
VALUES (%s, %s, %s, %s, %s, %s), (%s, %s, %s, %s, %s, %s), (%s, %s, %s, %s, %s, %s) 
PARAMS = (
     58, 15, 13, 2017, '1.34733900', 0, 
     58, 15, 14, 2017, None, 0, 
     58, 15, 15, 2017, '1960427.00000000', 0
)

Can perform the bulk create if we perform to calls: one with the None records and one with the numeric records.

Here's a snippet for identifying small subsets of records causing the bulk create to fail:

def bisect_exception(cls, values):
    values = list(values)
    try:
        cls.objects.bulk_create(values)
        logger.debug(f'Bulk insert succeed for {len(values)} values')
    except Exception as exc:
        if len(values) <= 3:
            logger.warning(f'Found bad value set: {values}, {exc}')
            from django.db import connection as conn
            print(conn.queries[-1])
            raise
        else:
            logger.warning(f'Bulk insert failed for {len(values)} values: bisecting values')
            half = len(values) // 2
            bisect_exception(cls, values[half:])
            bisect_exception(cls, values[:half])

twschiller avatar Jul 27 '18 14:07 twschiller

Did you have any luck with solving this? I've managed to recreate this issue and isolate it to the same cause using your code.

NidalM avatar Oct 01 '19 19:10 NidalM

@NidalM, no we ended up avoiding the issue. For upload performance, we use bcp for bulk uploads

twschiller avatar Oct 02 '19 13:10 twschiller

Here's a work-around I've been using. This finds which keys were None for each record then grouped records by common None rows.

def bulk_create_with_null(model: models.Model, rows: List[dict], delete_filters: Optional = Q()) -> None:
    model.objects.filter(delete_filters).delete()

    # group records by which columns are null
    null_keys = [tuple([key for key, value in record.items() if value is None]) for record in rows]
    index_groups = list(set(tuple(i) for i in null_keys))
    null_indicators = [index_groups.index(group) for group in null_keys]

    # create & insert objects for each of these 
    for ind in range(len(index_groups)):
        grouped_rows = [rows[idx] for idx, x in enumerate(null_indicators) if x == ind]
        database_objects = [model(**row) for row in grouped_rows]
        model.objects.bulk_create(database_objects)

    return

hbmccreery avatar Oct 20 '20 21:10 hbmccreery