pyodbc Need a way to define data type/length when passing variables in SQL Server

I'm doing selects/inserts/updates on my tables and SQL Server is creating an execution plan for every permutation of character lengths that the selects/inserts/updates into the table from python.

I believe this is causing some issues with the performance of the db and may create some memory issues.

I would like to be able to set the data type/length when passing parameters. Ideally, pyodbc would know the datatype/length before inserting and set them accordingly but I don't know if this will create more issues issues than it will solve, so being able to define them myself might help.

More specifically, when I run a trace, this is what I see:

declare @p1 int
set @p1=4882452
exec sp_prepexec @p1 output,N'@P1 nvarchar(11),@P2 nvarchar(16)',N'SELECT TOP 1 C_KEY FROM C WHERE A_NM = @P1 AND C_NM = @P2',N'Zantesuken1',N'RasalhagueLegacy'
select @p1

and when looking at the execution plans, I find that an execution plan has been created for every single permutation of these selects.

Perhaps something like this:

cursor.execute("SELECT X FROM Y WHERE Z1 = ? AND Z2 = ?", param1.type(varchar(50)), param2.type(bigint))

No idea if this makes sense and maybe this is already possible somehow and I don't know about it, so let me know if there's a way around this issue.

edit Also noticed that the auto-incrementing @p1 seems to be going up really quickly (I'm doing a lot of transactions). Is there any way to reset this counter? I've tried deleting the cursor, closing the connection, but nothing seems to be working.

Mar 14 '17 01:03 ZaM-Rai

Interesting. I guess it makes sense if the target size will matter to the execution plan. I need to look into this for one of my own projects.

Is it possible to test the impact using casts? Something like Z1 = cast(? as varchar(50)).

As for @p1, I actually don't know why the driver does that, but I don't believe it will be a problem. I have some 24x7 applications using pyodbc pretty heavily and they usually open a connection and leave it open for weeks or months - they reset when there is an error or it is restarted. I haven't noticed anything related to it.

Mar 14 '17 14:03 mkleehammer

@ZaM-Rai re: closing connection - The ODBC infrastructure on Windows automatically enables connection pooling for the SQL Server ODBC drivers, so you could try pyodbc.pooling = False before opening your connection(s), or even disable connection pooling for your SQL Server ODBC driver via the ODBC Administrator (odbcad32.exe).

Mar 14 '17 14:03 gordthompson

Having a way to specify - or at least "hint" at - parameter types and sizes (as can be done with .Parameters.Add() in ADO.NET) might also help with issues like #212.

Mar 14 '17 15:03 gordthompson

Not the most sophisticated test, but it does seem to indicate that different parameter lengths do produce multiple cache entries for the same command text:

import pyodbc
conn = pyodbc.connect("DSN=myDb")


def check_cache():
    sql = """\
    SELECT *
    FROM sys.dm_exec_cached_plans AS p
    CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS s
    WHERE s.text LIKE '(@p1 nvarchar%';
    """
    cache_crsr = conn.cursor()
    n = 0
    for row in cache_crsr.execute(sql):
        print(row.text)
        n += 1
    print(f"----- cache entries found: {n} -----")
    cache_crsr.close()


crsr = conn.cursor()
print("clearing cache ...")
crsr.execute("DBCC FREEPROCCACHE")
check_cache()
crsr.execute("SELECT * FROM dbo.Contacts WHERE FirstName LIKE ?", "g%").fetchall()
check_cache()
crsr.execute("SELECT * FROM dbo.Contacts WHERE FirstName LIKE ?", "gord%").fetchall()
check_cache()
crsr.close()
conn.close()

Console output:

clearing cache ...
----- cache entries found: 0 -----
(@P1 nvarchar(2))SELECT * FROM dbo.Contacts WHERE FirstName LIKE @P1
----- cache entries found: 1 -----
(@P1 nvarchar(5))SELECT * FROM dbo.Contacts WHERE FirstName LIKE @P1
(@P1 nvarchar(2))SELECT * FROM dbo.Contacts WHERE FirstName LIKE @P1
----- cache entries found: 2 -----

Mar 14 '17 22:03 gordthompson

@mkleehammer Hey, thanks for the reply, casting did not seem to do anything unfortunately. As for the @p1, I reached about 30M in about 15 hours, so I don't know if that's growing fast enough to become an issue, but if you're saying that it won't be a problem then I won't worry too much about it.

Also, turning off connection pooling did not seem to have any effect on the @p1 counter.

Mar 15 '17 04:03 ZaM-Rai

More research: https://msdn.microsoft.com/en-us/library/cc293623.aspx

Mar 27 '17 02:03 mkleehammer

I want this to be as automatic as possible, so I don't want the default solution to require user input.

I can think of some options:

ask the driver for the right length - requires a server round trip so only good for prepared statements
supply a rounded length to bucket plans, such as (length + 1024) % 1024
supply defaults such as hard coded or maximums

First I'd like to not prepare a statement until the 2nd use (configurable counter). This will give an immediate performance boost on most databases. (I think SQL Server may buffer the round trip when possible, so less of a boost there. If inserting NULL I have to query the type anyway.)

Then I think I'd like to implement the rounding with configurable boundaries.

Finally, try asking for lengths when preparing to see if it is worth it.

My only concern about rounding is how it might affect plan generation. I don't think it should but I need to see what statistics each database is keeping.

What does everyone think about this?

Mar 27 '17 03:03 mkleehammer

It seems like I'm getting another performance issue related to this; My table has bigint values in the column definition, but when the function is called in pyodbc, it is is originally defined as an int and the explain plan is telling me there is an implicit conversion to bigint, which could be impacting my performance.

It seems like these issues are related and in both cases seems to be non-negligibly impacting performance.

Are there any new developments on this?

Mar 27 '17 05:03 ZaM-Rai

I'm aware of the int / bigint, but I thought it was behaving like the 3.x version.

We don't know the target type for parameters without an execution plan compile and query, so we use int when the value fits ODBC's definition of int and bigint when bigger. The first 4.0 used bigint exclusively but some databases don't even support it so I had to revert to the current logic.

I wouldn't expect a performance hit if the conversion is the parameter, but SQL Server often misses easy conversions like that. A good C / C++ compiler would make a 64-temporary for comparisons in a loop - I don't know why SQL Server can't.

This could be a good argument for manually setting types, but since it requires code changes in the client would a case be just as good: "select * from t where id = (cast ? as bigint)"?

Mar 27 '17 14:03 mkleehammer

Well if it's a varchar, you could just default at a high number like 255 or 8000, since it should not affect performance on variable character fields. This should fix the multiple explain plans on the character issue for the most part, however it doesn't do much to help with the int/bigint issue.

As for the round trip, it might be worth it, not sure. I think testing would be required to see if it will create extra overhead but if it doesn't, I think it's definitely worth it.

For nulls I think I noticed earlier that it defaults to varchar instead of checking the type so I think that's definitely important if it hasn't been done already.

Also casting as bigint did not help, as the explain plan is still converting it.

I still think, while an automatic way is great, there should still be a way to explicitly set the datatypes, considering many users will already knows the datatypes and would probably be the most efficient way to do it assuming the datatypes are correct.

Mar 28 '17 04:03 ZaM-Rai

Hey mk, you seem to have closed this, does this mean you have a solution? :)

Mar 28 '17 23:03 ZaM-Rai

No - I not sure what I meant to close. Thanks

Mar 28 '17 23:03 mkleehammer

Please see the Binding Parameters page of the Wiki where I've collected some thoughts on how to move forward with this and a couple of other related issues.

I think something like pyodbc.set_var_binding_length(1024) might be what we want here.

Thoughts?

Mar 30 '17 17:03 mkleehammer

Seems like a good solution. So if I understand correctly you would have a command that lets us set the var length (say to 4096), and for nulls you would have to specify how it treats them (varchar vs int)?

If that's the case I do have an issue with the latter: what if I have to pass two variables of different types in the same statement and either or both can be null?

Would it be possible, perhaps, to enable/disable going the roundtrip to get the data types (and cache them for subsequent inserts/updates/etc) only for null values?

Also a note that the max in SQL Server on varchar is 8000, and nvarchar is 4000, so they're not powers of 2.

Mar 31 '17 05:03 ZaM-Rai

@ZaM-Rai said:

I still think, while an automatic way is great, there should still be a way to explicitly set the datatypes

I am inclined to agree, and it looks like the DB-API includes Cursor#setinputsizes for just that purpose.

Apr 03 '17 21:04 gordthompson

I'm curious, any news on this issue?

Apr 25 '17 03:04 ZaM-Rai

Ideally, pyodbc would know the datatype/length before inserting and set them accordingly

The parameter array implementation I was working on for https://github.com/mkleehammer/pyodbc/issues/120 does exactly that. You can give it a try at https://github.com/v-chojas/pyodbc/tree/paramarray . It is currently only for executemany() but you can pass in a single row to observe the effect. It also binds according to the type on the server, so you won't be seeing any server-side int/bigint conversions.

Jul 12 '17 18:07 v-chojas

Works like a charm, as far as caching the query plans go!

It does feel a bit more sluggish for some reason though, is executemany less efficient for passing a single row than execute?

Also using cursor.fetchone() on a select query does not give me any results (I get the error: Previous SQL was not a query). Any way to make this work with execute as well? This would solve a lot of issues and be very appreciated!

Jul 13 '17 03:07 ZaM-Rai

Good to hear that it fixes the execution plan proliferation.

I'm not surprised that it is slower, since that implementation also allocates and converts/copies the data into a parameter array; it was really intended for many-row inserts but as a "side effect" of using a parameter array, you need to know --- and thus bind --- the exact type+length. That error you're getting is just because the implementation is more of a proof-of-concept at the moment, so it's not complete.

It is possible to have execute() perform type discovery too, but as @mkleehammer mentioned above, it will add additional overhead. That overhead may be OK in the case of executemany() where it's only done once for many executions (and used to make those executions much faster, e.g. with parameter arrays), but will definitely slow things down if you only execute once.

Implementing DB-API Cursor.setinputsizes() would be my recommendation too --- perhaps with an extension to allow specifying exact SQL types, so that cases such as varchar vs. char can be distinguished, scale and precision specified for decimal types, etc. I can try to work on this.

Jul 13 '17 16:07 v-chojas

I have implemented a simple version of setinputsizes() which only recognises a sequence of integers for specifying the column size for character types. E.g. cursor.setinputsizes([50]). You can try it at: https://github.com/v-chojas/pyodbc/tree/setinputsizes

(This branch does not have the parameter array improvements, so you can try it with regular execute() and it should not be slower.)

Although DB-API states that type objects can be used, it doesn't seem so useful in this case since we are detecting the SQL type from the Python type anyway --- and there's no way to distinguish between int/bigint because they are both mapped from Python integers.

Jul 14 '17 20:07 v-chojas

The test failed; it is still passing strings as their original python types.

Here is my test code, to be used with an SQL Server trace:

cnxn = pyodbc.connect("DSN=ZAM_SQL", autocommit=True)
cursor = cnxn.cursor()

arg1 = 'Arg1'
arg2 = 'Arg12'
arg3 = 'Arg123'
arg4 = 'Arg1234'
arg5 = 'Arg12345'
arg6 = 'Arg123456'
arg7 = 'Arg1234567'
arg8 = 'Arg12345678'
arg9 = 'Arg123456789'
arg10 = 'Arg1234567890'

params = [arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10]

cursor.execute("IF OBJECT_ID('Test.dbo.Test2', 'U') IS NOT NULL DROP TABLE Test.dbo.Test2; ")
cnxn.commit()
cursor.execute("CREATE TABLE Test.dbo.Test2([Col1] [varchar](50) NULL)")
cnxn.commit()
for par in params:
	cursor.setinputsizes([50])
	cursor.execute("INSERT INTO Test.[dbo].Test2(Col1) values (?)", par)#.setinputsizes([50])
	cnxn.commit()

cursor.close()
del cursor
cnxn.close()`

Here is a sample of the trace:

declare @p1 int set @p1=1 exec sp_prepexec @p1 output,N'@P1 varchar(4)',N'INSERT INTO Test.[dbo].Test2(Col1) values (@P1)','Arg1' select @p1

declare @p1 int set @p1=2 exec sp_prepexec @p1 output,N'@P1 varchar(5)',N'INSERT INTO Test.[dbo].Test2(Col1) values (@P1)','Arg12' select @p1

declare @p1 int set @p1=3 exec sp_prepexec @p1 output,N'@P1 varchar(6)',N'INSERT INTO Test.[dbo].Test2(Col1) values (@P1)','Arg123' select @p1

etc...

Jul 15 '17 14:07 ZaM-Rai

I see a similar trace with your test script, but with only "@p1 varchar(50)" throughout. Please ensure that you are testing with my branch of pyodbc which has the setinputsizes() implemented. You can check by printing pyodbc.Cursor.setinputsizes.__doc__ . The original and paramarray branches will say "Ignored." while setinputsizes branch has a longer message about its implementation.

Jul 17 '17 16:07 v-chojas

I'm getting an error when trying to install it:

src\cursor.cpp(2120) : error C2017: illegal escape sequence

Jul 17 '17 23:07 ZaM-Rai

That was due to an extraneous trailing space, which some compilers will not accept. It should be fixed now.

Jul 18 '17 14:07 v-chojas

Awesome, it works on my end too :)

Now all that's left is to be able to set it per column including int/bigint. Either way as is I think it's already a pretty big improvement!

For example, the ideal scenario imo would be Col1.setcharsize([50]) = 'Abc' Col2.setintsize([8]) = 123456 (8 for 8-bit bigint for example, though it might be more complicated for people who don't know what the bit size of the integers are, alternatively it could be a string for tinyint, smallint, int, bigint, etc)

edit that didn't make sense it would be more like Col1 = 'Abc' Col1 = Col1.setcharsize([50])

Not sure if that still makes sense but w/e

Jul 18 '17 23:07 ZaM-Rai

Good. You can already set it for multiple columns; just specify an array, and it will use the size if specified:

cursor.setinputsizes([50, 5, 16])

To distinguish between int/bigint (and the other SQL types) will be somewhat trickier; I think a simple way would be to allow specifying a tuple (size/precision, scale, type) e.g.

cursor.setinputsizes([50, (None, None, pyodbc.SQL_BIGINT)])

Jul 19 '17 21:07 v-chojas

Awesome, yeah in my mind it would be best if it were somehow attached to the variable itself, but it's fine as well if it's on the cursor as long as it resets after each cursor use. The advantage of having it on the variable is that it won't change and you won't need to call the size/type every time the variable is used. If that's not practical though, your way is more than good enough.

Jul 20 '17 00:07 ZaM-Rai

I've updated the Binding Parameters wiki with another problem (Informix #260) and another proposal. Feedback?

Aug 12 '17 20:08 mkleehammer

Seems like a good plan in theory, this would probably solve it rather nicely as it wouldn't require setting any data types manually. I'm looking forward to trying it out!

Aug 13 '17 14:08 ZaM-Rai

@mkleehammer so, to summarise, the SQL type (and length/precision) could be obtained by:

Explicit setting
SQLDescribeParam
Python type

in order of precedence. However, I'm not sure if the benefits of adding an explicit cache here would outweigh the complexity and edge-cases that it could introduce, since ODBC already has a mechanism for doing something similar: prepared statements. SQLDescribeParam may be slow, but for a single parameterised query there is no choice to call it if you want to bind with the same SQL type, and if you want to execute that same query more than once, you should be using prepared statements/executemany() in which case it is still only a single SQLDescribeParam call per parameter.

In particular, the case where the schema changes is important to consider. In such a case, pyODBC-cached information may not longer be correct.

Aug 14 '17 20:08 v-chojas

pyodbc pyodbc copied to clipboard

Need a way to define data type/length when passing variables in SQL Server

pyodbc
pyodbc copied to clipboard