nhibernate-core icon indicating copy to clipboard operation
nhibernate-core copied to clipboard

Int32Type: avoid unnecessary boxing for common cases

Open wilbit opened this issue 10 months ago • 14 comments

Indexed properties of DbDataReader returns a boxed value, which we convert to Int32 (and then box again, but it is a different story). This is a performance hit and produces memory traffic. Why? Because, for example, SqlDataReader in Microsoft.Data.SqlClient stores int (and all other reference types) as a reference value, and reading of value as object forces SqlDataReader to box it.

wilbit avatar Mar 29 '24 17:03 wilbit

@wilbit can you show some numbers?

hazzik avatar Apr 10 '24 03:04 hazzik

@hazzik , sure! What numbers would you like to see? Memory traffic difference, performance difference, something else?

wilbit avatar Apr 10 '24 07:04 wilbit

A agree with this effort, but I believe it should go beyond just Int32Type. Every primitive type suffers from this to some extent. However, the only thing that significantly improves performance (as in CPU performance, time taken) is to not be permissive about the type returned in the reader. GetInt32 is at least twice as fast as GetValue (rs[...]), but that performance advantage is gone when preceding it with a GetFieldType. What remains is getting rid of the unnecessary boxing, which is of course very good in itself (probably more important) and the stated purpose of the PR.

gliljas avatar Apr 10 '24 11:04 gliljas

Would it be worth it to add a new option, something like sql_types.strict, disabled by default, which, when enabled, would cause the basic types to assume the data type loaded from the database is a perfect match?

That would allow for a simpler implementation of the optimization, which would be opt-in for those ensuring their database typing matches their mapping.

fredericDelaporte avatar Apr 28 '24 16:04 fredericDelaporte

I think there will be boxing anyway. What we want to achieve, if I understand correctly, is unnecessary (object)int => int => (object)int conversion.

I think following code should suffice

var value = rs[name]; // object
return rs[name] switch
{
        int _ => value,
	BigInteger bi => (int) bi,
	var c => Convert.ToInt32(c)
};

@wilbit could you please check if that would would improve your scenario?

hazzik avatar May 02 '24 07:05 hazzik

A agree with this effort, but I believe it should go beyond just Int32Type

One step at a time. Also, for my use cases I don't see so big memory traffic for other types (but we do not use bytes, longs, etc widely)...

Would it be worth it to add a new option, something like sql_types.strict, disabled by default, which, when enabled, would cause the basic types to assume the data type loaded from the database is a perfect match?

That would allow for a simpler implementation of the optimization, which would be opt-in for those ensuring their database typing matches their mapping.

This new options seems to me fragile, because different DbReaders have different behaviours how they handle small type mismatchings (like int <> long <> decimal). In terms of performance it would be great

  • to have implementations of Int32Type (and others) per db driver.
  • or the ability to create/assign custom implementations of this type "convertors"

I think there will be boxing anyway. What we want to achieve, if I understand correctly, is unnecessary (object)int => int => (object)int conversion.

I think following code should suffice

var value = rs[name]; // object
return rs[name] switch
{
        int _ => value,
	BigInteger bi => (int) bi,
	var c => Convert.ToInt32(c)
};

@wilbit could you please check if that would would improve your scenario?

Interesting. It should be the next:

var value = rs[name]; // object
return value switch
{
        int _ => value,
	BigInteger bi => (int) bi,
	var c => Convert.ToInt32(c)
};

to avoid 1 boxing, but I got you point.

Yes, it will work for what is said in the PR. But, honestly, I proposed my changes in a way they are because:

  • in my fork I also have a cache of boxed int values between 0 and 1000. I do not propose this cache here because it is very specific to my case and will make performance even worse for majority of users.
  • I hope that one day there will be a way to return something more light than boxed values.

So, it won't work for my scenario, but will work for the improvement in this PR.

PS: after some considerations, I think your approach also will work for me (it will change nothing for Firebird 4/BigInteger case, but I'm OK with that).

wilbit avatar May 02 '24 10:05 wilbit

The double boxing has been introduced by 990cd6af twenty years ago due to some deficiencies of a driver at that time. Prior to that, the code was using the typed methods of the data reader, without attempting to support unexpected types.

I consider it to be a bad legacy. It is neither sound nor safe to have a type mismatch between the model property type and the underlying table column type. Or am I missing cases where that is legit?

fredericDelaporte avatar May 14 '24 21:05 fredericDelaporte

Or am I missing cases where that is legit?

Yes: SQLite.

It was long on my todo list to redesign the type system to uncouple the Get and Set logic from the types themselves. Hibernate done this in I think v 6.0.

hazzik avatar May 14 '24 21:05 hazzik

The SQLite driver handles SQLite lax typing itself. It does perform the required conversion itself when a typed method is used, according to deAtog in #3530.

fredericDelaporte avatar May 14 '24 21:05 fredericDelaporte

I like the concept, but I do not agree with the approach taken here. In my opinion, NHibernate should always call the type specific methods of the DbDataReader. If and only if that fails, should any other conversions be performed using the value of the indexer property. Any calls to Convert must specify a locale which must be configurable by the user and default to CultureInfo.InvariantCulture. Any driver agnostic conversion must not rely on the type returned from calling GetFieldType of a DbDataReader.

The System.Data.SQLite driver for instance will automatically convert numeric types to a text form when inserting into a TEXT column and automatically convert from the text form to the numeric type when requested to do so. In such cases, a call to GetFieldType would return a String while calling SQLiteDataReader.GetDouble would return a double after converting the value.

The underlying database driver must be given the opportunity to convert the value to the correct type if necessary prior to any other conversion. This allows the underlying driver to perform the conversion using the locale of the database, or perform any other database specific conversions.

The GetDateTime method of the System.Data.SQLite driver, for instance, will attempt to convert a value stored in a TEXT, INTEGER, or NUMERIC column to a DateTime type using the DateTimeFormat, DateTimeFormatString, and DateTimeKind properties of the connection string. The indexer method of the SQLiteDbDataReader will not perform this conversion unless the DetectTextAffinity or DetectStringType flags are set on the Connection or Command object. These flags cause the driver to examine the values of all string columns returned from the database and negatively impact performance dramatically. The GetDateTime method does not have this problem as it only attempts conversions of requested columns.

https://github.com/nhibernate/nhibernate-core/commit/990cd6afc67724fcec9f7a160104addc0e423133 indicates there might be issues with the Oracle driver in regards to the type specific methods. If those methods throw exceptions, then any subsequent conversion using the indexer property of the DbDataReader would work as expected. However, this would be extremely inefficient for the Oracle driver as exceptions are inherently slow in .Net and this would likely cause a flurry of exceptions if these methods throw an exception on every call. In my opinion, the problematic driver should have been encapsulated by a proxy that implemented the type specific functions of the DbDataReader by converting the value from the indexer method to the appropriate type in the Oracle specific way. This would eliminate the flurry of exceptions for that driver.

The above approach could be taken here if the Microsoft.Data.SqlClient driver cannot perform the requested conversions without throwing exceptions for common type conversions. Such an implementation could rely on the value of calling GetFieldType of the underlying DbDataReader if necessary.

deAtog avatar May 15 '24 02:05 deAtog

For examples of why calling Convert without specifying a locale is wrong, see all the failing test cases I added in this pull request: https://github.com/nhibernate/nhibernate-core/pull/3548

The original failing case involved System.Data.SQLite's handling of DateTime which stores DateTime values as text by default.

I expanded my tests to include other common cases where a user might map a numeric property to a text column. A majority of these tests fail for the databases in the test suite here as the database will coerce numeric and date values to text upon insert, but NHibernate is not able to correctly retrieve the value. I suspect many of the failing tests would pass if the type specific methods of the DbDataReader were called instead of using Convert. This is certainly the case for SQLite.

deAtog avatar May 16 '24 14:05 deAtog

@wilbit could you please update the PR there are conflicts now?

hazzik avatar May 29 '24 04:05 hazzik

Sure, will do, and will apply your suggestions.

wilbit avatar May 29 '24 18:05 wilbit

Looks like 2 of 3 my commits are obsolete already =) Adjusted the 3rd for use of @hazzik approach

wilbit avatar Jun 07 '24 15:06 wilbit

When can it be merged, @hazzik?

wilbit avatar Jul 23 '24 08:07 wilbit

@wilbit sorry, missed the notification about force push. It seems that async code needs to be regenerated. Could you please do it or "allow edits from contributors" on the PR?

hazzik avatar Jul 24 '24 03:07 hazzik

@wilbit sorry, missed the notification about force push. It seems that async code needs to be regenerated. Could you please do it...

I tried to run "ShowBuildMenu -> Generate async code" before submitting the PR, but it does not produce any changes for committing =( I have no Idea what "Generate Async code / generate-async (pull_request_target)" task tries to commit 🤔

or "allow edits from contributors" on the PR?

This feature is not available for me because my fork is not under personal GitHub account.

@hazzik

wilbit avatar Jul 24 '24 11:07 wilbit