openai-cookbook
openai-cookbook copied to clipboard
[SUPPORT] Advice regarding SQL generation
Note: SQL generation use cases are high-risk in a production environment - models can be unreliable when generating consistent SQL syntax. A more reliable way to solve this problem may be to build a query generation API that takes the desired columns as input from the model.
I saw this quote in your new function calling cookbook example. The idea of a query generation API, is it based on a successful implementation or theory?
takes the desired columns as input from the model
The problem I've run into is SQL doesn't just retrieve columns from tables. Any non-trivial analysis would use aggregated metrics like avg, sum along with group by clauses. Advanced analysis would use things like row_number(), having, cross join, CTEs
As an example, here's a query to retrieve the best performing product in a category from the adventureworks sample database.
WITH ranked_products AS (
SELECT
pc.name AS category_name,
p.name AS product_name,
p.productnumber,
SUM(sod.orderqty * sod.unitprice) AS total_sales_amount,
ROW_NUMBER() OVER (PARTITION BY pc.name ORDER BY SUM(sod.orderqty * sod.unitprice) DESC) AS rank
FROM
production.product AS p
INNER JOIN production.productsubcategory AS psc ON p.productsubcategoryid = psc.productsubcategoryid
inner join production.productcategory as pc on pc.productcategoryid = psc.productcategoryid
INNER JOIN sales.salesorderdetail AS sod ON p.productid = sod.productid
GROUP BY
pc.name,
p.name,
p.productnumber
)
SELECT
category_name,
product_name,
productnumber,
total_sales_amount
FROM
ranked_products
WHERE
rank = 1;
If you have ideas on how to create a query generation API that can accept some input and output something like this I'd love to hear.