django-pyodbc-azure
django-pyodbc-azure copied to clipboard
Arithmetic overflow on bulk insert with None values
I'm getting and Arithmetic overflow error when performing a bulk insert against Azure SQL:
[22003] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Arithmetic overflow error converting nvarchar to data type numeric. (8115) (SQLExecDirectW)
I suspect the ODCB driver is mishandling the bulk_insert of one of the batches, based on the following observations:
- I can insert all of the the entries individually using
model.save() - I can insert the entries using
objects.bulk_insert(...)if I only pass a singleton list at a time; it fails for some chunks (somewhere between 4-8 records, of 5000 records) if I pass chucks of size 2 or larger - Other entries with None for
[field2]work are inserted OK, so it's probably not an issue with how ODBC/SQL is handling the None field - The operation always fails for chunks containing the same records
Here is the query from Django:
"QUERY = 'INSERT INTO [table] ([field1], [field2], [field3], [field4], [field5], [field6], [field7])
VALUES (%s, %s, %s, %s, %s, %s, %s), (%s, %s, %s, %s, %s, %s, %s)' - PARAMS = (
1, None, '42147.0510000000', '0.28480000', '0.06274000', '0.06700000', '384932.8700000000',
1, '759.63000000', '203591.5870000000', '0.18545510', '0.40873770', '0.42296560', '498149.1000000000'
)"
The field definitions in the DB are as follows:
[field1] [int] NOT NULL,
[field2] [numeric](24, 10) NULL,
[field3] [numeric](24, 10) NOT NULL,
[field4] [numeric](18, 8) NOT NULL,
[field5] [numeric](18, 8) NOT NULL,
[field6] [numeric](18, 8) NOT NULL,
[field7] [numeric](24, 10) NOT NULL,
Based on this, there should be enough space in the numeric fields. (In the real query, there are 21 fields. The fields I excluded are FKs.)
I'm working on logging the actual SQL that Azure SQL is retrieving to see if it doesn't match the Django-reported SQL
The problem seems to be related to None values mixed with numeric values.
Here is the query generated by django-pyodbc-azure for the bulk create. value is a numeric(18, 8) nullable field in the database
INSERT INTO [table_name] ([factory_id], [country_id], [factor_id], [year], [value], [exclude])
VALUES (%s, %s, %s, %s, %s, %s), (%s, %s, %s, %s, %s, %s), (%s, %s, %s, %s, %s, %s)
PARAMS = (
58, 15, 13, 2017, '1.34733900', 0,
58, 15, 14, 2017, None, 0,
58, 15, 15, 2017, '1960427.00000000', 0
)
Can perform the bulk create if we perform to calls: one with the None records and one with the numeric records.
Here's a snippet for identifying small subsets of records causing the bulk create to fail:
def bisect_exception(cls, values):
values = list(values)
try:
cls.objects.bulk_create(values)
logger.debug(f'Bulk insert succeed for {len(values)} values')
except Exception as exc:
if len(values) <= 3:
logger.warning(f'Found bad value set: {values}, {exc}')
from django.db import connection as conn
print(conn.queries[-1])
raise
else:
logger.warning(f'Bulk insert failed for {len(values)} values: bisecting values')
half = len(values) // 2
bisect_exception(cls, values[half:])
bisect_exception(cls, values[:half])
Did you have any luck with solving this? I've managed to recreate this issue and isolate it to the same cause using your code.
@NidalM, no we ended up avoiding the issue. For upload performance, we use bcp for bulk uploads
Here's a work-around I've been using. This finds which keys were None for each record then grouped records by common None rows.
def bulk_create_with_null(model: models.Model, rows: List[dict], delete_filters: Optional = Q()) -> None:
model.objects.filter(delete_filters).delete()
# group records by which columns are null
null_keys = [tuple([key for key, value in record.items() if value is None]) for record in rows]
index_groups = list(set(tuple(i) for i in null_keys))
null_indicators = [index_groups.index(group) for group in null_keys]
# create & insert objects for each of these
for ind in range(len(index_groups)):
grouped_rows = [rows[idx] for idx, x in enumerate(null_indicators) if x == ind]
database_objects = [model(**row) for row in grouped_rows]
model.objects.bulk_create(database_objects)
return