langchain
langchain copied to clipboard
Add Spark SQL support
Add Spark SQL support
- Add Spark SQL support. It can connect to Spark via building a local/remote SparkSession.
- Include a notebook example
I tried some complicated queries (window function, table joins), and the tool works well. Compared to the Spark Dataframe agent, this tool is able to generate queries across multiple tables.
Note: There was an approach based on SQLDatabase. But @dev2049 suggests not inheriting from SQLDatabase.
https://github.com/hwchase17/langchain/pull/4381
@skcoirz Thanks for help updating this one!
@skcoirz Thanks for help updating this one!
yeah, sure thing! I tested this. The new query checker is really powerful! It solved the previous concern of AnalysisException. Thank you so much for adding this! During the test, I noticed a few more opportunities. I have added them to our spreadsheet. Happy to chat more when you have time! Have a good weekend! :D
Moved the rest new features to a new PR on top of this branch. (https://github.com/hwchase17/langchain/pull/4672)
cc @vowelparrot @hwchase17 could you review this one? The new agent is helpful for the Apache Spark community.
I just did a final check before merging. There is a bug in the memory support. I reverted it to make this first version simple and robust. Discuss with @skcoirz offline and he will create another PR for general support for Agents. I also verified by rerunning the notebook. It works great.
@hwchase17 @dev2049 @skcoirz Thanks for reviewing this!