nhibernate-core Handle SQL injection vulnerabilities within ObjectToSQLString

Fix SQL injections noticed in #3516.

It does not fix wholly #3516. Bugs noted in #3516 that do not cause a security issue are not fixed.

May 12 '24 17:05 fredericDelaporte

That is currently handling only the string case, and tests need to be added.

May 12 '24 17:05 fredericDelaporte

I don't like this fix. In my opinion, every dialect should have a method to escape a string value for use in a SQL statement. This allows the mechanism of how strings are escaped to vary from dialect to dialect. Additionally, this alleviates the need for a flag as the default implementation of this function in the Dialect class can simply return the value as is.

For example, here is an EscapeString method that I wrote to escape string literals for MySql. Feel free to adapt the code as needed:

public static string EscapeString(string s) {
    StringBuilder sb = new StringBuilder();

    foreach (char c in s) {
        switch (c) {
            case '\0':
                sb.Append(@"\0");
                break;
            case '\b':
                sb.Append(@"\b");
                break;
            case '\n':
                sb.Append(@"\n");
                break;
            case '\r':
                sb.Append(@"\r");
                break;
            case '\t':
                sb.Append(@"\t");
                break;
            case '\x1A':
                sb.Append(@"\Z");
                break;
            case '\'':
                sb.Append(@"\'");
                break;
            case '\"':
                sb.Append("\\\"");
                break;
            case '\\':
                sb.Append(@"\\");
                break;
            default:
                sb.Append(c);
                break;
         }
    }

    return sb.ToString();
}

May 16 '24 18:05 deAtog

~~I would agree with @deAtog that it would be better to delegate the escape to the dialect instead of introducing the configuration option.~~

EDIT: the implementation looks legit.

May 17 '24 01:05 hazzik

I do not really understand your comment, deAtog.

every dialect should have a method to escape a string value for use in a SQL statement

But that is what this PR does, since the base implementation is overridable. Any dialect can define its own implementation if needed. To minimize the likeliness of leaving a dialect with a vulnerability, we should have the most common escaping implemented in the base dialect. All dialects in NHibernate support doubling the quotes. Quite many also support backslash escaping as an alternative, which forces to escapes backslashes too for them. That is why the base implementation handles doubling single quotes and provides an option to easily enable backslash escapes.

Furthermore, dialects known to support backslash escapes are enabling the option by default. I still let open the possibility to enable/disable it by configuration due to cases like PostgreSQL, which has varied about its backslash escape handling.

A dialect whose needs would require a different implementation can still provide it, as done in the DB2 case.

About additional escapes that may be required for database also handling a backslash escape, I may be wrong but it seemed to me the backslash escaping was enough. Or would not escaping line feeds and the like still open injection vulnerabilities for them?

I am not attempting to implement completely all the various escaping of all the supported databases. The subject is to remove a corner cases injection vulnerability. (There is very few uses of literal injection in NHibernate: discriminators, which are normally static values in mapping files, not application user data; HQL queries referencing static fields, which is a very little known and used feature; and some sql builder helpers overloads that NHibernate does not use and that we are going to obsolete.)

May 18 '24 20:05 fredericDelaporte

On Monday, May 20, 2024 at 09:35:45 AM PDT, deAtog ***@***.***> wrote:

@deAtog commented on this pull request.

In src/NHibernate/Dialect/DB2Dialect.cs:

```
	if (type == null)
```

  		throw new ArgumentNullException(nameof(value));

  	// See https://www.ibm.com/docs/en/db2/11.5?topic=elements-constants#r0000731__title__7

  	var literal = new StringBuilder(value);

  	var isUnicode = type.DbType == DbType.String || type.DbType == DbType.StringFixedLength;

```
  	if (isUnicode)
```
```
  		literal.Replace(@"\", @"\\");
```
```
  	literal
```
```
  		.Replace("'", "''")
```
```
  		.Insert(0, '\'')
```
```
  		.Append('\'');
```
```
  	if (isUnicode)
```
```
  		literal.Insert(0, "U&");
```

It would be nice if this code avoided multiple iterations over the initial value. The longer the value and more escaping added, the more expensive this operation could become. I would recommend similar changes elsewhere.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

May 20 '24 19:05 Pfmawlaw

I do not really understand your comment, deAtog.

every dialect should have a method to escape a string value for use in a SQL statement

My point here is that the base dialect shouldn't try to do anything. In my opinion this should probably be an abstract method that all dialects are required to implement. I personally do not like the idea of having a base implementation for something that is largely database dependent.

But that is what this PR does, since the base implementation is overridable. Any dialect can define its own implementation if needed. To minimize the likeliness of leaving a dialect with a vulnerability, we should have the most common escaping implemented in the base dialect. All dialects in NHibernate support doubling the quotes. Quite many also support backslash escaping as an alternative, which forces to escapes backslashes too for them. That is why the base implementation handles doubling single quotes and provides an option to easily enable backslash escapes.

Furthermore, dialects known to support backslash escapes are enabling the option by default. I still let open the possibility to enable/disable it by configuration due to cases like PostgreSQL, which has varied about its backslash escape handling.

A dialect whose needs would require a different implementation can still provide it, as done in the DB2 case.

About additional escapes that may be required for database also handling a backslash escape, I may be wrong but it seemed to me the backslash escaping was enough. Or would not escaping line feeds and the like still open injection vulnerabilities for them?

Personally, I don't know. I can see cases where certain control characters may cause problems. The handling of these characters is solely database/library dependent. Some may have issues, others may not. For example, one might be able to insert a backspace character into a string to delete previous characters or insert a null character to early terminate the string.

Also, while there may be a standard for escaping single quotes within a single quoted string literal, not every database escapes them according to the standard. MySQL, for instance, recommends escaping single quotes with a backslash, but supports the use of doubling the number of single quotes within a single quoted string literal.

I am not attempting to implement completely all the various escaping of all the supported databases. The subject is to remove a corner cases injection vulnerability. (There is very few uses of literal injection in NHibernate: discriminators, which are normally static values in mapping files, not application user data; HQL queries referencing static fields, which is a very little known and used feature; and some sql builder helpers overloads that NHibernate does not use and that we are going to obsolete.)

I'll admit, I don't know the full scope of this issue. Most of the above seem like cases where the injection would have to be purposeful. However, if there is a case where a user-provided value can be directly inserted into a query then I wouldn't leave anything to chance. In my experience, it's better to err on the side of caution than to assume a partial implementation corrects all the issues.

May 21 '24 15:05 deAtog

In my opinion this should probably be an abstract method that all dialects are required to implement.

That is a breaking change which would mandate a new major version. This cannot be done in a patch release. (A throwing base implementation would not be acceptable either for a patch release.)

This should not be a user configurable option in my opinion.

See the PostgreSQL case: it was supporting backslash escapes by default up to its 9.1 version, with an option to disable it. Then it has ceased supporting them by default, but it still allows to re-enable them with a configuration setting. That case can only be handled through a configuration setting on our side too, since we cannot assume which configuration will have the database, whatever its version. The user needs to be able to enable/disable the escape for PostgreSql according to its database configuration, if he uses one of the features relying on literal injection.

For example, your implementation for DB2 ignores this setting while always escaping back slashes for Unicode strings.

Because its Unicode string encoding mandates backslash escapes, no matter what. So, that would be nonsense to follow the configuration settings. Such a setting is meant for cases where the user can know better what is the correct course of action.

It would be nice if this code avoided multiple iterations over the initial value.

Sure. But I see this as just a nice to have, since the implied features are either very seldomly used, or, in the case of discriminators, usually very short.

My first course of resolution was to suggest banning all literal injections in favor of parameterization, which is the best way to avoid injection vulnerabilities. But some discriminator use cases where this could significantly lower performances were raised. So, now, I attempt a "good enough" fix instead.

May 21 '24 18:05 fredericDelaporte

In my opinion this should probably be an abstract method that all dialects are required to implement.

That is a breaking change which would mandate a new major version. This cannot be done in a patch release. (A throwing base implementation would not be acceptable either for a patch release.)

A base implementation that does nothing but return the value provided would be best in my opinion. This would allow the method to be changed to abstract in a future major release. This also guarantees that the base implementation does not break some obscure dialect. As is, there is no guarantee that the current implementation doesn't introduce a breaking change for all dialects.

This should not be a user configurable option in my opinion.

See the PostgreSQL case: it was supporting backslash escapes by default up to its 9.1 version, with an option to disable it. Then it has ceased supporting them by default, but it still allows to re-enable them with a configuration setting. That case can only be handled through a configuration setting on our side too, since we cannot assume which configuration will have the database, whatever its version. The user needs to be able to enable/disable the escape for PostgreSql according to its database configuration, if he uses one of the features relying on literal injection.

There's no doubt that a configuration flag is needed here for PostgreSQL. The problem here is that you've found a case where the dialect breaks the base implementation and have introduced a global flag to avoid a dialect specific issue, hence my recommendation above. Dialects are provided all configuration properties when the Dialect.Configure is called. My recommendation would be to only interpret the escape_backslash_in_strings property from within the PostgreSQL dialect. There is no convention on where to define dialect specific properties, but I don't think defining it with all the other global properties is appropriate as it implies that it affects all dialects alike. I have seen many cases of cache specific properties that use a similar approach. Such properties are usually only mentioned in the documentation, but it would be good to have constants defined for them and a convention of where to find them.

For example, your implementation for DB2 ignores this setting while always escaping back slashes for Unicode strings.

Because its Unicode string encoding mandates backslash escapes, no matter what. So, that would be nonsense to follow the configuration settings. Such a setting is meant for cases where the user can know better what is the correct course of action.

That's exactly my point. From the context provided, the user has no idea whether or not the setting will affect the dialect they're using.

It would be nice if this code avoided multiple iterations over the initial value.

Sure. But I see this as just a nice to have, since the implied features are either very seldomly used, or, in the case of discriminators, usually very short.

My first course of resolution was to suggest banning all literal injections in favor of parameterization, which is the best way to avoid injection vulnerabilities. But some discriminator use cases where this could significantly lower performances were raised. So, now, I attempt a "good enough" fix instead.

Parameters would undoubtedly be better in my opinion, regardless of performance. There are many opportunities where performance could be improved, but now is not the time to address that concern.

May 21 '24 21:05 deAtog

I have added the fix for the char type. I have reordered and cleaned the commits by the way, putting tests first to facilitate checking them without the fix.

May 26 '24 18:05 fredericDelaporte

@fredericDelaporte is it still WIP?

Jun 03 '24 13:06 hazzik

I was intending to fix the cases of the other types allowing injection, like decimal.

For it to be no more WIP, either I fix the other type cases, or I change it to be the fix only for character and string types.

Jun 04 '24 20:06 fredericDelaporte

Ok, thanks for the clarification

Jun 05 '24 10:06 hazzik

@hazzik Even if this were only for strings this is not ready for merging in my opinion. I have reservations about breaking non-ANSI compliant dialects with the changes proposed here. It would be far better to switch to parameters, regardless of performance impacts, than use what has been implemented here. NHibernate already suffers from quite a few performance related issues and the impact of using parameters here is irrelevant given their security benefits.

Jun 06 '24 04:06 deAtog

@deAtog you're welcome to submit a PR with your suggestions.

Jun 06 '24 04:06 hazzik

Still WIP: I have to review all the other types to check if they allow some SQL injection.

Jun 10 '24 00:06 fredericDelaporte