prql
prql copied to clipboard
`USING` not available in `dialect:mssql`
Discussed in https://github.com/prql/prql/discussions/729
Originally posted by spenserblack June 29, 2022 This is a really awesome tool!
Starting as a question as I don't know if this is intended behavior or not.
prql dialect:mssql
from employees
join jobs [employee_id]
has the output below (playground and prql-compiler 0.2.1)
SELECT
employees.*,
jobs.*,
employee_id
FROM
employees
JOIN jobs USING(employee_id)
As far as I know (just based on usage and some Googling), USING(employee_id) isn't valid in SQL Server. So, when using dialect:mssql, should the compiler
- have an error that this syntax isn't available for
dialect:mssql(and suggest[employees.employee_id==jobs.employee_id])? - have a warning that this might not work with the given dialect?
- change resulting SQL from
USING(employee_id)toON employees.employee_id = jobs.employee_id?
Thanks @spenserblack — you're correct that SQL Server doesn't support USING.
So we should convert that to your 3rd option; ON employees.employee_id = jobs.employee_id.
One issue may be that the columns are then duplicated in the output. When we know the column names, we can deal with that. Otherwise, we'll have to fall back to having multiple columns. (I'm not sure whether this is allowed in SQL Server? I'm guessing it's inelegant but allowed; it's a bigger issue if it raises an error, and this sort of join isn't possible at all...)
This would be a moderate difficulty issue for someone who wanted to start contributing. I & others would be around to show where to start / answer any questions.
I agree, it should be the third option. This is exactly the issue we are trying to solve with PRQL: have a single language that works on multiple engines - no matter how old.
Regarding the "too many columns" issue: I think that there are many cases where we cannot produce SQL without knowing all the columns in the table (see #172, #696). Your suggestion applies to all these cases and I think it's a sound approach. We can detect that there may be too much columns and emit a warning:
warning: result may contain too many columns
3: join jobs [employee_id]
-----------
|_ not enough information about schema of tables `employees` and `jobs`
hint: you may want to add table definition:
table employees = ( ... )
table jobs = ( ... )
Obsolete, we don't emit USING anymore.