nhibernate-core
nhibernate-core copied to clipboard
Int32Type: avoid unnecessary boxing for common cases
Indexed properties of DbDataReader
returns a boxed value, which we convert to Int32
(and then box again, but it is a different story).
This is a performance hit and produces memory traffic. Why? Because, for example, SqlDataReader
in Microsoft.Data.SqlClient stores int
(and all other reference types) as a reference value, and reading of value as object
forces SqlDataReader
to box it.
@wilbit can you show some numbers?
@hazzik , sure! What numbers would you like to see? Memory traffic difference, performance difference, something else?
A agree with this effort, but I believe it should go beyond just Int32Type. Every primitive type suffers from this to some extent. However, the only thing that significantly improves performance (as in CPU performance, time taken) is to not be permissive about the type returned in the reader. GetInt32 is at least twice as fast as GetValue (rs[...]), but that performance advantage is gone when preceding it with a GetFieldType. What remains is getting rid of the unnecessary boxing, which is of course very good in itself (probably more important) and the stated purpose of the PR.
Would it be worth it to add a new option, something like sql_types.strict
, disabled by default, which, when enabled, would cause the basic types to assume the data type loaded from the database is a perfect match?
That would allow for a simpler implementation of the optimization, which would be opt-in for those ensuring their database typing matches their mapping.
I think there will be boxing anyway. What we want to achieve, if I understand correctly, is unnecessary (object)int => int => (object)int
conversion.
I think following code should suffice
var value = rs[name]; // object
return rs[name] switch
{
int _ => value,
BigInteger bi => (int) bi,
var c => Convert.ToInt32(c)
};
@wilbit could you please check if that would would improve your scenario?
A agree with this effort, but I believe it should go beyond just Int32Type
One step at a time. Also, for my use cases I don't see so big memory traffic for other types (but we do not use bytes, longs, etc widely)...
Would it be worth it to add a new option, something like
sql_types.strict
, disabled by default, which, when enabled, would cause the basic types to assume the data type loaded from the database is a perfect match?That would allow for a simpler implementation of the optimization, which would be opt-in for those ensuring their database typing matches their mapping.
This new options seems to me fragile, because different DbReaders have different behaviours how they handle small type mismatchings (like int
<> long
<> decimal
).
In terms of performance it would be great
- to have implementations of
Int32Type
(and others) per db driver. - or the ability to create/assign custom implementations of this type "convertors"
I think there will be boxing anyway. What we want to achieve, if I understand correctly, is unnecessary
(object)int => int => (object)int
conversion.I think following code should suffice
var value = rs[name]; // object return rs[name] switch { int _ => value, BigInteger bi => (int) bi, var c => Convert.ToInt32(c) };
@wilbit could you please check if that would would improve your scenario?
Interesting. It should be the next:
var value = rs[name]; // object
return value switch
{
int _ => value,
BigInteger bi => (int) bi,
var c => Convert.ToInt32(c)
};
to avoid 1 boxing, but I got you point.
Yes, it will work for what is said in the PR. But, honestly, I proposed my changes in a way they are because:
- in my fork I also have a cache of boxed
int
values between 0 and 1000. I do not propose this cache here because it is very specific to my case and will make performance even worse for majority of users. - I hope that one day there will be a way to return something more light than boxed values.
So, it won't work for my scenario, but will work for the improvement in this PR.
PS: after some considerations, I think your approach also will work for me (it will change nothing for Firebird 4/BigInteger case, but I'm OK with that).
The double boxing has been introduced by 990cd6af twenty years ago due to some deficiencies of a driver at that time. Prior to that, the code was using the typed methods of the data reader, without attempting to support unexpected types.
I consider it to be a bad legacy. It is neither sound nor safe to have a type mismatch between the model property type and the underlying table column type. Or am I missing cases where that is legit?
Or am I missing cases where that is legit?
Yes: SQLite.
It was long on my todo list to redesign the type system to uncouple the Get and Set logic from the types themselves. Hibernate done this in I think v 6.0.
The SQLite driver handles SQLite lax typing itself. It does perform the required conversion itself when a typed method is used, according to deAtog in #3530.
I like the concept, but I do not agree with the approach taken here. In my opinion, NHibernate should always call the type specific methods of the DbDataReader. If and only if that fails, should any other conversions be performed using the value of the indexer property. Any calls to Convert must specify a locale which must be configurable by the user and default to CultureInfo.InvariantCulture. Any driver agnostic conversion must not rely on the type returned from calling GetFieldType of a DbDataReader.
The System.Data.SQLite driver for instance will automatically convert numeric types to a text form when inserting into a TEXT column and automatically convert from the text form to the numeric type when requested to do so. In such cases, a call to GetFieldType would return a String while calling SQLiteDataReader.GetDouble would return a double after converting the value.
The underlying database driver must be given the opportunity to convert the value to the correct type if necessary prior to any other conversion. This allows the underlying driver to perform the conversion using the locale of the database, or perform any other database specific conversions.
The GetDateTime method of the System.Data.SQLite driver, for instance, will attempt to convert a value stored in a TEXT, INTEGER, or NUMERIC column to a DateTime type using the DateTimeFormat, DateTimeFormatString, and DateTimeKind properties of the connection string. The indexer method of the SQLiteDbDataReader will not perform this conversion unless the DetectTextAffinity or DetectStringType flags are set on the Connection or Command object. These flags cause the driver to examine the values of all string columns returned from the database and negatively impact performance dramatically. The GetDateTime method does not have this problem as it only attempts conversions of requested columns.
https://github.com/nhibernate/nhibernate-core/commit/990cd6afc67724fcec9f7a160104addc0e423133 indicates there might be issues with the Oracle driver in regards to the type specific methods. If those methods throw exceptions, then any subsequent conversion using the indexer property of the DbDataReader would work as expected. However, this would be extremely inefficient for the Oracle driver as exceptions are inherently slow in .Net and this would likely cause a flurry of exceptions if these methods throw an exception on every call. In my opinion, the problematic driver should have been encapsulated by a proxy that implemented the type specific functions of the DbDataReader by converting the value from the indexer method to the appropriate type in the Oracle specific way. This would eliminate the flurry of exceptions for that driver.
The above approach could be taken here if the Microsoft.Data.SqlClient driver cannot perform the requested conversions without throwing exceptions for common type conversions. Such an implementation could rely on the value of calling GetFieldType of the underlying DbDataReader if necessary.
For examples of why calling Convert without specifying a locale is wrong, see all the failing test cases I added in this pull request: https://github.com/nhibernate/nhibernate-core/pull/3548
The original failing case involved System.Data.SQLite's handling of DateTime which stores DateTime values as text by default.
I expanded my tests to include other common cases where a user might map a numeric property to a text column. A majority of these tests fail for the databases in the test suite here as the database will coerce numeric and date values to text upon insert, but NHibernate is not able to correctly retrieve the value. I suspect many of the failing tests would pass if the type specific methods of the DbDataReader were called instead of using Convert. This is certainly the case for SQLite.
@wilbit could you please update the PR there are conflicts now?
Sure, will do, and will apply your suggestions.
Looks like 2 of 3 my commits are obsolete already =) Adjusted the 3rd for use of @hazzik approach
When can it be merged, @hazzik?
@wilbit sorry, missed the notification about force push. It seems that async code needs to be regenerated. Could you please do it or "allow edits from contributors" on the PR?
@wilbit sorry, missed the notification about force push. It seems that async code needs to be regenerated. Could you please do it...
I tried to run "ShowBuildMenu -> Generate async code" before submitting the PR, but it does not produce any changes for committing =( I have no Idea what "Generate Async code / generate-async (pull_request_target)" task tries to commit 🤔
or "allow edits from contributors" on the PR?
This feature is not available for me because my fork is not under personal GitHub account.
@hazzik