clickhouse-sqlalchemy icon indicating copy to clipboard operation
clickhouse-sqlalchemy copied to clipboard

select() clause with join creates a cross join

Open AviSarmani opened this issue 2 years ago • 6 comments

Describe the bug Hi, I'm in the process of upgrading our codebase to use SqlAlchemy 1.4. I have a problem with select clause(that SqlAlchemy would like us to move to instead of query) A simple .join statement produces a cross join as it adds the table twice Once for the table in the select and once for the join. I feared that I'm not familiar enough with the select clause but I tested the same code with Postgres and it works fine so I'm a bit puzzled.

To Reproduce The tables:


CREATE TABLE First
(

    `first_column` String
)
ENGINE = Memory;`
`
CREATE TABLE Second
(
    `second_column` String
)
ENGINE = Memory;`

The select:

stmt = select(
        first.c.first_column,
        second.c.second_column,
    ).join(second, first.c.first_column == second.c.second_column)

In Postgres engine(with tables created there as well), this produces:

SELECT "First".first_column, 
	"Second".second_column 
FROM "First" 
JOIN "Second" ON "First".first_column = "Second".second_column

Which runs fine. In CH though this produces.

SELECT "First".first_column, 
	"Second".second_column 
FROM "Second", "First" 
INNER JOIN "Second" ON "First".first_column = "Second".second_column

that ultimately crashes: DB::Exception: Ambiguous column 'Second.second_column': While processing SELECT first_column AS First.first_column, Second.second_column FROM Second CROSS JOIN First INNER JOIN Second ON First.first_column = Second.second_column. Stack trace:

Would love to have some help with this.

Versions clickhouse-driver==0.2.4 clickhouse-sqlalchemy==0.2.1 SQLAlchemy==1.4.39 python 3.10.5 CH version:22.7.1.2484

AviSarmani avatar Aug 02 '22 19:08 AviSarmani

Hi.

It seems that it's not pointed out in documentation. Use select from the package. from clickhouse_sqlalchemy import select

xzkostyan avatar Aug 04 '22 07:08 xzkostyan

Hi, Thanks for the quick response. When using the select from the package I get a different error.


from clickhouse_sqlalchemy import select as ch_select

    stmt = ch_select(
        first.c.first_column,
        second.c.second_column,
    ).join(second, first.c.first_column == second.c.second_column)

The resulting SQL is a bit different but still has the cross join and gives a different error:

(
SELECT "First".first_column AS first_column
	,"Second".second_column AS second_column
FROM "First"
	,"Second"
) AS anon_1
INNER JOIN "Second" ON "First".first_column = "Second".second_column

Executable SQL or text() construct expected, got <sqlalchemy.sql.selectable.Join at 0x7ffbe932ffd0; Join object on anon_1(140719925951552) and Second(140719925945744)>.

AviSarmani avatar Aug 04 '22 08:08 AviSarmani

Could you provide whole snippet for reproducing issue?

xzkostyan avatar Aug 04 '22 08:08 xzkostyan

Hi, We're using reflection to create the table objects if that matters.

from sqlalchemy import MetaData, create_engine
from clickhouse_sqlalchemy import Table, engines
from clickhouse_sqlalchemy import select as ch_select
    
    engine = create_engine(
        f"clickhouse+native://{user}:{password}@{host}:{port}/{default_db}?{args}",
        pool_pre_ping=True,
    )

    metadata = MetaData(bind=engine)

    first = Table(
        "First",
        metadata,
        engines.Memory(),
        autoload=True,
        extend_existing=True,
    )
    second = Table(
        "Second",
        metadata ,
        engines.Memory(),
        autoload=True,
        extend_existing=True,
    )

    stmt = ch_select(
        first.c.first_column,
        second.c.second_column,
    ).join(second, first.c.first_column == second.c.second_column)

AviSarmani avatar Aug 04 '22 09:08 AviSarmani

Hi @xzkostyan , Have you had a chance to look at it? Can you reproduce it on your end? I just want to know if it's a known issue or something wrong that I'm doing Thanks is advance.

AviSarmani avatar Aug 28 '22 18:08 AviSarmani

Looks like the ".join_from" method produces the correct result:

from clickhouse_sqlalchemy import select as ch_select

ch_select(first.c.first_column, second.c.second_column).join_from(
        first, second, first.c.first_column == second.c.second_column
    )

Produces:

SELECT "First".first_column,
        "Second".second_column 
FROM "First" JOIN "Second"  ON "First".first_column = "Second".second_column

but it's more cumbuersome. From the SA documentation the join_from(), looks like it's supposed to work the same as a join but let us indicate the left and right side of the JOIN explicitly.

AviSarmani avatar Aug 28 '22 19:08 AviSarmani