querybuilder icon indicating copy to clipboard operation
querybuilder copied to clipboard

Upstreaming improvements

Open dgeelen-uipath opened this issue 3 years ago • 7 comments

Dear maintainers,

We are looking to use SqlKata in one of our products. Our use case is that we want to be able to write queries that we can run against both our production database (Snowflake) and in our unit tests (SQLite). To this end we will be forking SqlKata and implementing Snowflake support. We do however also want to make other changes, such as improving the support for aggregate functions. The current support is too limited for our use case, where our immediate need is to be able to do multiple aggregates in one query, with the ability to give each aggregate column a name (alias). We intend to upstream as much of the non-snowflake-specific changes as possible. To this end we would like to ask you a few questions.

Current work

We have currently done roughly the following as our first steps:

  • Ensure consistent white space throughout all files, to ease development effort on our side. This introduces an .editorconfig.
  • Implement AS keyword support for aggregates. This makes it so that you can name an aggregate with a custom name, rather than always using 'as '. The existing behaviour is kept as a default, but this could be discussed. Additionally the was existing support for using as (with that spacing and casing) in a column name is preserved, although it is not implemented for aggregates as we believe that providing the alias explicitly is a superior approach.
  • Extend aggregate support to multiple columns. This enables the writing of queries like SELECT MIN(COLUMN), MAX(COLUMN), SUM(COLUMN) FROM TABLE. The existing behaviour for handling multiple COUNT() columns is preserved.

The last two changes should fix e.g. #203. Q: Are PRs for these improvements likely be accepted?

Next steps

Next we are going to implement additional aggregate functions, starting with ANY_VALUE, APPROX_PERCENTILE, and PERCENTAGE. Our intent is to implement these with compiler-specific implementations for Snowflake and/or SQLite, and (where possible) a generic implementation for the rest of the supported engines. Q: What would be the best way to handle cases where a generic implementation would not be possible?

Another type of operation that we would like to be able to support is to generate more free-form expressions, such as for example SUM(A)/COUNT(DISTINCT B). This seems quite difficult to implement at the current time, so Q: we would like to know what (if any) plans there are to improve the extensibility and expressiveness of the library (and its interface) in the future?

Kind regards, UiPath

dgeelen-uipath avatar Jun 14 '21 09:06 dgeelen-uipath

Hi @dgeelen-uipath, happy that you've found SqlKata useful, and hope it helps in your project.

For the first point, I liked the .editorconfig approach and would be happy to merge if you provide a PR. for the last two points, we are working hard to release a new version (currently we don't have a clear timeline) that supports free expressions, that helps in such complex cases like the ones mentioned above, in addition to case when statement, string and date time functions, etc... check this branch https://github.com/sqlkata/querybuilder/blob/feature/expression-support to get an idea.

so to summarize: Q: Are PRs for these improvements likely be accepted? A: for the 1st one yes off course, for the rest I need more details about how they are implemented

Q: What would be the best way to handle cases where a generic implementation would not be possible? A: While the expression support is a good candidate here, using C# extensions + Compiler Specific queries are the easiest way to start with.

something similar to this one

public static class QueryExtensions
{
    public static Query SelectLength(this Query query, string column)
    {
        return query
            .ForSqlServer(q => q.SelectRaw($"LEN({column})"))
            .ForMySql(q => q.SelectRaw($"LENGTH({column})"));
    }
}

Here is another example on how you can implement WhereDate

public static Query WhereDate(this Query query, string col, DateTime val)
{
  return query.ForPostgres(q => q.WhereRaw($"[{col}]::date = ?", new [] {val.Date}))
              .ForMySql(q => q.WhereRaw("$DATE([{col}]) = ?", new [] {val.Date}))
              .ForSqlServer(q => q.WhereRaw("$CAST([{col}] as date) = ?", new [] {val.Date}))
}

check https://sqlkata.com/docs/advanced#engine-specific-queries

Q: we would like to know what (if any) plans there are to improve the extensibility and expressiveness of the library (and its interface) in the future? As mentioned before, we are working on adding support for Free Expressions, but currently we don't have a clear milestone.

if you need more clarification let me know. Best

ahmad-moussawi avatar Jun 14 '21 20:06 ahmad-moussawi

Hi @ahmad-moussawi, thanks for the quick response!

We'll open a PR for the .editorconfig soon. For the aggregate improvements, please take a look at our repo at https://github.com/UiPath/querybuilder to get an idea of what we did.

Q: What would be the best way to handle cases where a generic implementation would not be possible? A: While the expression support is a good candidate here, using C# extensions + Compiler Specific queries are the easiest way to start with.

What we really wanted to know was what to do in case an engine really doesn't support a particular operation (and no fall-back can be implemented). For example, in our implementation for ANY_VALUE we pick MIN as a fall-back for all engines except Snowflake, as the minimum value is just as good as any other value. However, the APPROX_PERCENTILE function may not be possible to implement at all generically.

In the examples above, e.g. SelectLength(), what would happen when compiling for e.g. Firebird? Wouldn't the select be (silently) dropped (not compiled)? We would prefer that the compilation fails instead. We see in the expression branch that we could just throw NotImplementedException(). The actual question we were asking is "would it be acceptable to add a function to SqlKata that is not supported by all compilers". I.e. is SqlKata aiming to cover the lowest-common-denominator, or is it accepted that not all engines are equally capable.

The expression support branch looks pretty cool, and somewhat similar to an idea we had. The main difference is that we were thinking about building a more generic abstract syntax tree (AST) based approach. The way we envision this could work is that the 'main' SqlKata library would construct an AST, and each compiler would work by performing one or more transformations on the generic AST. E.g. a compiler could add, remove and replace nodes in the AST, possibly with compiler specific nodes, to implement specific behaviour. A benefit of this approach would be that the 'main' SqlKata library could focus on providing the general utility functionality for operating on the AST, while compilers can become more independent from the main library. This is obviously a long-term ideal/goal, but a first step towards this is currently under way in https://github.com/UiPath/querybuilder/pull/10 and https://github.com/UiPath/querybuilder/pull/9.

  • In pr-10 the main objective was to remove the CompileColumn() function, and replace it by a Compile() function on each column type (the column types could later become nodes in an AST).
  • The second pr (9) shows a (crude) example of how a compiler could replace nodes to implement generic behaviour. In this pr we implement the ANY_VALUE function, where the generic implementation generates a MIN(), but the Snowflake compiler replaces this 'node' by a node for the ANY_VALUE function, whereas the SQLite compiler generates a node to re-write this to a COALESCE() expression.

We're interested in hearing your thoughts on the AST idea, i.e. did you consider it, why or why not, are there problems that we've missed, etc.? Our main reservation to the current approach in the feature/expression-support branch is what happens in e.g. CompileCondition(), where you're basically manually (re)implementing a virtual dispatch rather than levering OO principles.

Kind regards, UiPath

dgeelen-uipath avatar Jun 17 '21 09:06 dgeelen-uipath

Hi @dgeelen-uipath, regarding your questions,

would it be acceptable to add a function to SqlKata that is not supported by all compilers

We prefer to avoid this as much as possible, unless there is no other way to do it, I faced the same scenario once where I had two different db engines, Postgres on Development, and SqlServer on Production, and I've added some extension methods on my project, that use the ForSqlServer and ForPostgres to achieve what needed, and personally I find this is best way in such cases.

for the AST topic, I've started SqlKata years ago with the purpose of 1) keeping it simple to maintain, and 2) perform the most obvious operations like complex joins, nested conditions etc..., and 3) keep a door for full flexibility when needed without to much overhead from the development side, (...Raw methods to the rescue)

My personal experience with full AST approach that it's too much overhead for an SQL Query Builder, the expression-support branch was the first trial to open a door for more flexibility through AST-like and implementing the Visitor pattern.

I think the best approach from maintenance/time perspective is to add the expressions support, and migrate the necessary parts to be expressed as native expressions, i.e. WhereDate("AcceptedAt") could be converted to something similar to Where(Date(Column("AcceptedAt"))).

Anyway I am open for suggestion and this was just my personal experience, If you have any useful resources regarding this topic (AST) I would be happy if you share with me.

ahmad-moussawi avatar Jun 23 '21 15:06 ahmad-moussawi

would it be acceptable to add a function to SqlKata that is not supported by all compilers

We prefer to avoid this as much as possible, unless there is no other way to do it, I faced the same scenario once where I had two different db engines, Postgres on Development, and SqlServer on Production, and I've added some extension methods on my project, that use the ForSqlServer and ForPostgres to achieve what needed, and personally I find this is best way in such cases.

So we assume you mean that in our user code we could add an extension method on Query which uses .ForEngine() to selectively generate clauses for a sub-set of engines. E.g. something like

Query SelectFlurp(this Query query, string expression) {
	return query
		.ForPostgres(...)
		.ForSqlServer(...)
	;
}
We see a few of drawbacks to this approach (click to expand)
  • The main drawback is that, as a library user you end up writing engine specific code, whereas we would like to have such code in the library instead.
  • It is not quite clear how one would cause such an extension method to generate an exception when used in combination with an unsupported engine. It seems inherent in the design of the ForEngine() extensions that queries (or parts thereof) written using these will only run for the given engine, so you are setting yourself up for silent failures in the case of using another (different) engine.
  • You will loose out on automatic updates to database engines. I.e. suppose an engine supports operation FLURP, so your write your code to use ForSqlEngineOne(). If at a later date another database starts to support FLURP, you need to update the code in your application in order to support it, rather than just updating to a newer version of SqlKata that can generate FLURP also for the new engine.
  • It doesn't seems like the ForEngine() mechanism allows for writing queries in a 'type-safe' way; the result of ForEngine() is still a Query, rather than an engine specific EngineQuery (but this is likely due to C# not supporting covariant return types until very recently). This means that if one wanted to use a method FLURP, the only way to do so would be to use e.g. SelectRaw("FLURP(...)"), i.e. using raw SQL. Using the raw SQL injection is error prone and we would like to avoid it as much as possible. E.g. if ForEngine() worked like the following, you could more safely call the FLURP function:
	new Query()
		.ForEngine(EngineCode.EngineOne, q => q.Flurp('ColumnA', 0.75)) // q has e.g. type 'class EngineOneQuery : Query'
	;

Given all these drawbacks, we feel the ForEngine() extension mechanism is not well suited for library level support for a given function, at best it can be offered as a fall back for user code. Indeed SqlKata acknowledges this in its description: By providing a level of abstraction over the supported database engines, that allows you to work with multiple databases with the same unified API. (emphasis ours). Functions like ForSqlEngine don't really seem like they fit in a unified API, because they apply to only one database.

We propose that 'sufficiently common' functionality to be made available on the library API level, with an engine that does not implement a given function acknowledging so by implementing an NotImplementedException rather than silently not (or generating a wrong) SQL statement.

for the AST topic, ....

We believe that using an AST-with-transforms rather than the visitor pattern would reduce the overhead on the development side, because a lot of the boiler plate code that comes with the visitor pattern can be avoided.

  • E.g. rather than listing all types that need to be compiled in CompileCondition(), each type can implement a Compile() function, and it will be called 'automatically'. When adding a new type, there would be no need to list it in CompileCondition().
  • If there is a need for a compiler to do something custom, then instead of making a specific extension function for each customization on class Compiler and overriding that function in each compiler, the customization can be implemented completely from the CustomCompiler class. This avoids the need to put lots of different virtual functions on the base compiler just to support each customization point.

Ultimately we think the AST-with-transforms approach will be more powerful and allow for more expressions to be supported, because the compiler can more easily access information about the query structure when applying a transform.


N.b.: better diff for the feature/expression-support branch: https://github.com/sqlkata/querybuilder/compare/3c0c875da27842b00259eb0c062c26d5a17c8b62...feature/expression-support

dgeelen-uipath avatar Jul 01 '21 13:07 dgeelen-uipath

@ahmad-moussawi, hi, just a little bump as we are waiting for your feedback on this and #499 :)

dgeelen-uipath avatar Jul 20 '21 13:07 dgeelen-uipath

Hi @ahmad-moussawi - just checking if you're interested in this work - should we still wait for feedback/ try to get this merged in the main repo?

virgilp avatar Jul 27 '21 08:07 virgilp

@dgeelen-uipath @virgilp might be a long shot but I am also hoping to use this library with Snowflake. Any chance you all could publish your Snowflake compiler?

Thanks!

lawrence-vo avatar Jan 02 '23 01:01 lawrence-vo