ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Best practices for a data-ware-house knowlege base?

Open baicl123 opened this issue 10 months ago • 1 comments

Self Checks

  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (Language Policy).
  • [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • [x] Please do not modify this template :) and fill in all the required fields.

Describe your problem

After reading this: Implementing Text2SQL with RAGFlow

I'm dataware manager, I want to upload all metadata(DDL, description, business description, dictionary, tables and columns relationships) to RAGFlow knowlege base, and to achieve this:

  • for IT, they can retrievel tables (via table name, table description, business info. etc..),and ask LLM helping generate SQLs.

You can image, there will be a large number of DDLs. What is the best practices?

  1. put all DDLs in one file or one DDL one file? (some DDLs have more than 300 columns)
  2. why chunk token number is "8", shouldn't be a large number? ( can hold all columns)
  3. compare with DDL and excel document, which one is better?

baicl123 avatar Mar 14 '25 11:03 baicl123

For DDL text files, set ; as delimiter and leave the chunk token number as default. File number should not be concerned.

KevinHuSh avatar Mar 17 '25 03:03 KevinHuSh

For DDL text files, set ; as delimiter and leave the chunk token number as default. File number should not be concerned.

In the article, the default chunk token number is 8, it works, but why 8 tokens can keep a very large table?

baicl123 avatar Apr 07 '25 09:04 baicl123

Split via delimiter firstly, then, merge those pieces by chunk token number.

KevinHuSh avatar Apr 08 '25 11:04 KevinHuSh