[Question]: Best practices for a data-ware-house knowlege base?
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
After reading this: Implementing Text2SQL with RAGFlow
I'm dataware manager, I want to upload all metadata(DDL, description, business description, dictionary, tables and columns relationships) to RAGFlow knowlege base, and to achieve this:
- for IT, they can retrievel tables (via table name, table description, business info. etc..),and ask LLM helping generate SQLs.
You can image, there will be a large number of DDLs. What is the best practices?
- put all DDLs in one file or one DDL one file? (some DDLs have more than 300 columns)
- why chunk token number is "8", shouldn't be a large number? ( can hold all columns)
- compare with DDL and excel document, which one is better?
For DDL text files, set ; as delimiter and leave the chunk token number as default.
File number should not be concerned.
For DDL text files, set
;as delimiter and leave the chunk token number as default. File number should not be concerned.
In the article, the default chunk token number is 8, it works, but why 8 tokens can keep a very large table?
Split via delimiter firstly, then, merge those pieces by chunk token number.