prql icon indicating copy to clipboard operation
prql copied to clipboard

`USING` not available in `dialect:mssql`

Open max-sixty opened this issue 3 years ago • 3 comments

Discussed in https://github.com/prql/prql/discussions/729

Originally posted by spenserblack June 29, 2022 This is a really awesome tool!

Starting as a question as I don't know if this is intended behavior or not.

prql dialect:mssql
from employees
join jobs [employee_id]

has the output below (playground and prql-compiler 0.2.1)

SELECT
  employees.*,
  jobs.*,
  employee_id
FROM
  employees
  JOIN jobs USING(employee_id)

As far as I know (just based on usage and some Googling), USING(employee_id) isn't valid in SQL Server. So, when using dialect:mssql, should the compiler

  • have an error that this syntax isn't available for dialect:mssql (and suggest [employees.employee_id==jobs.employee_id])?
  • have a warning that this might not work with the given dialect?
  • change resulting SQL from USING(employee_id) to ON employees.employee_id = jobs.employee_id?

max-sixty avatar Jun 29 '22 19:06 max-sixty

Thanks @spenserblack — you're correct that SQL Server doesn't support USING.

So we should convert that to your 3rd option; ON employees.employee_id = jobs.employee_id.

One issue may be that the columns are then duplicated in the output. When we know the column names, we can deal with that. Otherwise, we'll have to fall back to having multiple columns. (I'm not sure whether this is allowed in SQL Server? I'm guessing it's inelegant but allowed; it's a bigger issue if it raises an error, and this sort of join isn't possible at all...)

max-sixty avatar Jun 29 '22 19:06 max-sixty

This would be a moderate difficulty issue for someone who wanted to start contributing. I & others would be around to show where to start / answer any questions.

max-sixty avatar Jun 29 '22 19:06 max-sixty

I agree, it should be the third option. This is exactly the issue we are trying to solve with PRQL: have a single language that works on multiple engines - no matter how old.

Regarding the "too many columns" issue: I think that there are many cases where we cannot produce SQL without knowing all the columns in the table (see #172, #696). Your suggestion applies to all these cases and I think it's a sound approach. We can detect that there may be too much columns and emit a warning:

warning: result may contain too many columns
3:  join jobs [employee_id]
               -----------
                 |_ not enough information about schema of tables `employees` and `jobs`

hint: you may want to add table definition:

    table employees = ( ... )
    table jobs = ( ... )

aljazerzen avatar Jul 01 '22 11:07 aljazerzen

Obsolete, we don't emit USING anymore.

aljazerzen avatar Dec 20 '22 11:12 aljazerzen