langchain icon indicating copy to clipboard operation
langchain copied to clipboard

sql: do not hard code the LIMIT clause in the table_info section

Open jzluo opened this issue 1 year ago • 2 comments

Seeing a lot of issues in Discord in which the LLM is not using the correct LIMIT clause for different SQL dialects. ie, it's using LIMIT for mssql instead of TOP, or instead of ROWNUM for Oracle, etc. I think this could be due to us specifying the LIMIT statement in the example rows portion of table_info. So the LLM is seeing the LIMIT statement used in the prompt. Since we can't specify each dialect's method here, I think it's fine to just replace the SELECT... LIMIT 3; statement with 3 rows from table_name table:, and wrap everything in a block comment directly following the CREATE statement. The Rajkumar et al paper wrapped the example rows and SELECT statement in a block comment as well anyway. Thoughts @fpingham?

jzluo avatar Mar 09 '23 18:03 jzluo

This makes a lot of sense to me Jon! I would probably add some context to tell the LLM where these rows are coming from. E.g.:

these are the first 3 rows from table_name table:

wdyt?

fpingham avatar Mar 10 '23 19:03 fpingham

Yeah so right now it looks like:


CREATE TABLE "Track" (
	blahblahblah
)
/*
2 rows from Track table:
TrackId	Name	AlbumId	MediaTypeId	GenreId	Composer	Milliseconds	Bytes	UnitPrice
1	For Those About To Rock (We Salute You)	1	1	1	Angus Young, Malcolm Young, Brian Johnson	343719	11170334	0.99
2	Balls to the Wall	2	2	1	None	342562	5510424	0.99
*/

I think that's enough, but do you think we should add more verbiage?

jzluo avatar Mar 13 '23 15:03 jzluo

yeah, I was wondering if adding that they are the 'first' rows would give any extra info to the agent but thinking it again it doesn't seem it would change much - I think it is ok as it is right now

fpingham avatar Mar 13 '23 20:03 fpingham

I'm still seeing this problem.

Is the fix in:

langchain 0.0.150

?

I have to add to the prompt: "Do not use the LIMIT keyword in SQL statements.", but then some cases where it would typically limit the returnset to 10, it's not. If I add to the prompt it should use TOP, I get it in every query, etc...

bobbydowling avatar Apr 26 '23 23:04 bobbydowling