sea-query icon indicating copy to clipboard operation
sea-query copied to clipboard

Redesign `TypeName`, `DynIden` and `Expr` for correctness

Open Expurple opened this issue 5 months ago • 15 comments

See the thread starting from this comment.

Let's continue that discussion about redesigning these types. They should be able to represent most valid SQL type names, identifiers and expressions, and none of the invalid ones.

Expurple avatar Aug 13 '25 17:08 Expurple

Expressions can be divided into two categories:

Terminal/Atomic nodes:

  • Value
  • Keyword
  • Iden

Non-Terminal/Composite nodes:

  • Unary
  • Binary
  • List (separated by commas / space)
  • Wrapped (parenthesized, arrays)

A rough design might look like this:

enum Expr {
    Value(..),
    Keyword(..),
    Iden(..),

    Unary(UnOper, Self),
    Binary(Self, BinOper, Self),
    List(Vec<Self>, Separator),
    Wrapped(Self, Wrapper),

    Raw(..),
}

Huliiiiii avatar Aug 13 '25 19:08 Huliiiiii

Or we can convert exprs into structs and turn it into Box<dyn Buildable>

Huliiiiii avatar Aug 13 '25 19:08 Huliiiiii

Your list of non-terminal nodes assumes that they can only be built of other expressions and operators/separators/parentheses that combine them; and that everything can be "lowered" to these primitives. But that's not the case. FunctionCall, SubQuery, AsEnum and Case can also contain other things like keywords (DISTINCT in function args, CASE control flow) and type names. We have Keyword and TypeName variants today, but that's a hack that needs to be removed. Those aren't valid expressions. We need to have specific, "special", high-level structs/variants (like FunctionCall, SubQueryStatement and CaseStatement) to properly handle such nodes.

I don't see how Buildable structs change anything. We still need to correctly design these structs, just like we need to correctly design Expr variants today. Separate structs may be useful to represent specific AST nodes in the type system, but for that we can "extract" those without getting rid of Expr. The corresponding Expr variant would be left with one field which is that struct

Expurple avatar Aug 14 '25 05:08 Expurple

FunctionCalls can be represented as:

List(
  [
    Keyword, 
    Wrapped(
      List(
        ...
      ),
      Parenthesis
    ),
  ],
  Nothing
)

But this is a little bit complicated.

Higher-level structures or variants can make it easier to construct more complex expressions, but more variants will bring higher complexity.

Huliiiiii avatar Aug 14 '25 07:08 Huliiiiii

I understood this idea. Yes, this is possible. But the problem in my view is that Keyword is not actually a valid Expr. IMO, Expr::Keyword and Expr::TypeName should not exist. If we remove those, some SQL constructs can no longer be represented using your "common" low-level method

Expurple avatar Aug 14 '25 08:08 Expurple

How about this?

enum Atom {
   Lit(..),
   Keyword(..),
   Iden(..),
   Value(..),
   ..
}

enum Expr {
  Atom(Atom),
  ..
}

Huliiiiii avatar Aug 14 '25 08:08 Huliiiiii

Doesn't change anything. These types still allow you to construct invalid expressions from arbitrary keywords: let keyword_expr = Expr::Atom(Atom::Keyword(..)). I don't want to allow expressions like IF + 3. I want to have a clear, obvious contract:

"An Expr is a syntactically correct SQL expression (unless the user has opted into custom raw SQL strings)."

Most developers would expect Expr to be a fuckin expression, not "an arbitrary AST node that can also be something like BEGIN or int[] because that was easier to implement". It's a usability and maintainability problem. If you blur several unrelated concepts into a single "flexible" type, eventually you hit problems like https://github.com/SeaQL/sea-query/issues/827. Turns out, identifiers, type names and column names are three different things! They need to be quoted differently, support different syntax, support different conversion constructors, etc. Expressions, keywords and type names are also three different things that need to be handled differently.

Expurple avatar Aug 14 '25 11:08 Expurple

Flexibility and correctness are often difficult to achieve at the same time.

I think one possible approach is to split different types of expressions into separate structs, and define their serialization to SQL through traits. However, this approach may either sacrifice some flexibility, or require complex trait bounds to preserve it, which can lead to confusing error messages.

Huliiiiii avatar Aug 14 '25 12:08 Huliiiiii

For flexibility, we already have cust* methods and Cust/Custom variants. I adore them. When something isn't supported natively, they usually provide a working solution. I'm not sure if we need a different, semi-structured "arbitrary node" workaround in addition to that.

I know that you need #926. But if we look closer at it, an "expression list" isn't itself a valid expression that can be freely composed with other expressions. Which means that ExprList should be a separate type, rather than a new variant of Expr. I think, we should add that type, then add a backend function to render that into an SQL string, and then you can safely format that rendered string into a bigger raw string that comprises your JSON_EXISTS function call.

(Update: I've reposted this comment under your original Expr::List proposal. We can continue this thread over there if that's more appropriate)

I think one possible approach is to split different types of expressions into separate structs, and define their serialization to SQL through traits.

I don't see how that's relevant here. We need some "general expression node" in order to compose expressions. That's just how the SQL grammar works. If the Expr non-terminal was expressed as Box<dyn Buildable> instead of enum Expr , I would still argue that ExprList, TypeName and Keyword shouldn't implement Buildable and shouldn't be usable as expression nodes.

(Although, they'd obviously have to implement some other trait in order to be accepted by the backend in other contexts. That's equivalent to having "a backend function to render ExprList" that I propose)

Expurple avatar Aug 14 '25 13:08 Expurple

I think it’s fine to discuss this here. In your view, which variants should remain in Expr, and which should be split out? What are your thoughts on the overall design of the new API?

Huliiiiii avatar Aug 14 '25 13:08 Huliiiiii

It's really simple. Only the variants that are actually expressions should stay in Expr. Use expr + 1 as a quick test. "Real" expressions should be composable.

I don't have time to consult the grammar and check every variant that we have today. But it's obvious that Expr::TypeName doesn't pass this test and should be removed. In practice, it's only used to implement casts, so we should replace it with a Cast(Box<Expr>, TypeName) variant. It's very similar to the existing Expr::AsEnum variant, so we should dive into that and see if these variants can be unified.

Sorry, I caused some confusion about Keyword previously. I should've checked the Keyword contents. Turns out, it's already "only the keywords that are usable as expressions", not "arbitrary SQL keywords". Expr::Keyword can stay.

Expurple avatar Aug 14 '25 13:08 Expurple

Cast sounds good.

AsEnum just... Cast

https://github.com/SeaQL/sea-query/blob/8346526380d18d9c2d06fc5b85694a029333449a/src/backend/postgres/query.rs#L39-L51

Huliiiiii avatar Aug 14 '25 13:08 Huliiiiii

Maybe this can be removed?

// src/extension/postgres/types.rs
#[derive(Clone, Debug)]
#[non_exhaustive]
pub enum TypeRef {
    Type(DynIden),
    SchemaType(DynIden, DynIden),
    DatabaseSchemaType(DynIden, DynIden, DynIden),
}

Huliiiiii avatar Aug 17 '25 06:08 Huliiiiii

Oh, I missed that. Yeah, we should remove this enum solution and reuse struct TypeName (keeping it not Postgres-specific). Either pub type TypeRef = TypeName or just rename TypeName to TypeRef. Probably, the latter

Expurple avatar Aug 17 '25 08:08 Expurple

Another thing is size. Although Rust has optimizations, multiple DynIden still take up a lot of space, and most of the time it will not be used.

Boxing the less-used parts can reduce size, but will cause performanceloss in certain situations, which may be controlled by a feature flag.

Huliiiiii avatar Oct 21 '25 13:10 Huliiiiii

I ran into an issue in #1023. The current TypeRef design doesn’t support arrays,it looks like the right thing to use there is ColumnType. However, users might reasonably expect TypeRef to work for arrays too.

Huliiiiii avatar Dec 19 '25 02:12 Huliiiiii

There’s a lot of overlap between ColumnType and TypeRef, but ColumnType doesn’t support schema names or table names, while TypeRef doesn’t support arrays. Would merging them be the more correct approach?

Huliiiiii avatar Dec 19 '25 02:12 Huliiiiii

I wasn't aware of ColumnType. We could use that instead of TypeRef if we add support for qualifying ColumnType::Custom with a schema and add all the same egronomic constructors (impl MaybeQualifiedTwice). From a quick look, I like ColumnType's structured u32 (etc) fields more than pre-packing everything into a raw SQL string and then handling that string very carefully (like we have to do for arrays, for example: https://github.com/SeaQL/sea-query/pull/1024)

Expurple avatar Dec 19 '25 07:12 Expurple

If we use ColumnType, that immediately solves the issue that I have with TypeRef: DynIden is a bad choice for unqualified type names, because those aren't limited to "simple" identifiers can be more complex things like VARCHAR(123)

Expurple avatar Dec 19 '25 08:12 Expurple