explorer
explorer copied to clipboard
Support expressions in `on` option for complex Joins
Description:
I would like to propose a new feature in Elixir Explorer that allows the on
option in the join
function to support more complex expressions. Currently, Explorer provides functionality for join operations with a limited ability to specify joins using simple equality
checks (e.g., on: [{"column1", "column2"}]
). To enhance the flexibility of joining tables based on complex conditions that cannot be easily expressed with only column names, it would be beneficial to extend the join on
clause to accept Explorer expressions.
Example Use Case:
Consider the following SQL query, where a complex condition is used in the ON
clause to perform a left join:
SELECT
AP.table1,
AP.column1,
AP.table2,
AP.column2,
CASE WHEN AR.has_relationship THEN 1 ELSE 0 END AS 'has_match'
FROM
ALL_PAIRS AS AP
LEFT JOIN
ALL_RELATIONSHIPS AS AR
ON
((AR.table_name = AP.table1 AND AR.column_name = AP.column1) AND
(AR.referenced_table_name = AP.table2 AND AR.referenced_column_name = AP.column2)) OR
((AR.table_name = AP.table2 AND AR.column_name = AP.column2) AND
(AR.referenced_table_name = AP.table1 AND AR.referenced_column_name = AP.column1))
Currently, to achieve this behavior in Explorer, one has to perform two joins, as shown below:
result =
all_pairs
|> DataFrame.join(all_relationships_1, how: :left,
on: [
{"table_name_1", "table_name"},
{"column_name_1", "column_name"},
{"table_name_2", "referenced_table_name"},
{"column_name_2", "referenced_column_name"}
])
|> DataFrame.join(all_relationships_2, how: :left,
on: [
{"table_name_2", "table_name"},
{"column_name_2", "column_name"},
{"table_name_1", "referenced_table_name"},
{"column_name_1", "referenced_column_name"}
])
|> DataFrame.mutate(has_relationship: coalesce(has_relationship_1, has_relationship_2))
Proposed Enhancement:
I propose that the on
option be enhanced to allow expressions, making it possible to perform complex joins more succinctly. Ideally, the code would look something like this:
result =
all_pairs
|> DataFrame.join(all_relationships_1, how: :left,
on: ((table_name == table_name_1 and column_name = column_name_1) and
(referenced_table_name = table_name_2 and referenced_column_name = column_name_2)) or
((table_name = table_name_2 and column_name = column_name_2) and
(referenced_table_name = table_name_1 and referenced_column_name = column_name_1))
)