sql-metadata icon indicating copy to clipboard operation
sql-metadata copied to clipboard

Resolve subqueries in where clause

Open ReinierKoops opened this issue 3 years ago • 8 comments

Hi,

I think I have found a bug regarding subqueries, namely a subquery that seems intentionally left out in (https://github.com/macbre/sql-metadata/blob/master/sql_metadata/keywords_lists.py):

query = "SELECT a.model FROM CAR_NAMES a JOIN CARS_DATA b ON a.MakeId = b.Id WHERE b.Weight < (SELECT avg(Weight) FROM CARS_DATA)" print(query.columns, "\n", query.columns_aliases_names, "\n", query.tables, "\n", query.tables_aliases, "\n", query.subqueries)

I expected this to result in recognition of the subquery (SELECT avg(Weight) FROM CARS_DATA), however it is not recognized, the output is: ['car_names.model', 'car_names.makeid', 'cars_data.id', 'cars_data.weight', 'weight'] [] ['car_names', 'cars_data'] {'a': 'car_names', 'b': 'cars_data'} {}

Something else, which I think is a different issue; I would expect the column 'weight' of the subquery to be recognized as 'cars_data.weight' as well.

I think it would be useful when the complete query has more than one table found, to append the table name before any column found.

ReinierKoops avatar Jul 09 '21 09:07 ReinierKoops

For now, only column defining subqueries and join subqueries are supported -> so resolving subqueries in where clause it's an enhancement.

As for the column names right now we do not try to guess from which table the column comes if it's not explicitly provided, in your sample it's quite easy (one table) but in a lot of cases it's not:

Select col_a, col_b from table_a, table_b # (from which table is which column?)

Select col_a, col_b from (select * from table_a left join table_b on aa=bb)  sub # again which column from which table etc.

If you want explicit resolving you need to prefix the column with an alias, so a subquery like

SELECT avg(cd.Weight) FROM CARS_DATA cd

collerek avatar Jul 09 '21 10:07 collerek

Thank you, would be nice enhancement. Are you perhaps familiar with any tool that could do the alias prefix resolving you mentioned? Currently, I have to deal with user input, which often is prone to errors like in the stated example. Normally one would for example use prepared statements and the like, however, my usecase is for teaching/learning/sandbox purposes so its supposed to be free form sql input.

ReinierKoops avatar Jul 09 '21 10:07 ReinierKoops

What do you mean by alias resolving?

We have a bug that you reported (multiple aliases without as in from clause) Apart from this if you provide the prefix for a column it should be resolved by sql-metadata (also to subqueries and with queries).

If you mean to resolve the columns to tables without explicitly provided table names/aliases it's often impossible without access to the database (you would have to query system tables to check names in tables columns) -> .i.e. the samples I provided.

collerek avatar Jul 09 '21 10:07 collerek

Yes, I meant resolving the columns to tables without explicitly provided table names/aliases.

Makes sense, I guess I will have to find a way around it. Thanks for your help 😄

ReinierKoops avatar Jul 09 '21 10:07 ReinierKoops

If there is no prefix the column name has to be unique in the scope of the query -> which means that you should be able to resolve it with db access -> either by quering system tables or by auto-reflecting tables i.e. with sqlalchemy.

So you can extract all tables used with sql-metadata but then you need the actual db to fetch the list of columns per table.

collerek avatar Jul 09 '21 10:07 collerek

Hi,

I saw that in order to spot subqueries you rely on having an alias for the whole subquery. Problem with the WHERE clause is that usually subqueries inside doesn't have any alias. Any idea on how to find a way around for spotting subqueries inside WHERE clause? I was thinking about working with pairs of parentheses but subqueries are not the only one surrounded by parentheses.

Thanks!

valentingarnier avatar Oct 15 '21 13:10 valentingarnier

Yes, I meant resolving the columns to tables without explicitly provided table names/aliases.

Makes sense, I guess I will have to find a way around it. Thanks for your help 😄

Hi Reinier,

I am also facing similar problem. Which work around worked for you? Could you please help me on this.

Best Regards, Chandan

analyst009 avatar Jan 24 '22 08:01 analyst009

It would be nice if this feature to resolve subqueries in the where clause is implemented.

leonardomathon avatar Mar 25 '22 12:03 leonardomathon