autoflow icon indicating copy to clipboard operation
autoflow copied to clipboard

Support export and import knowledge base

Open sykp241095 opened this issue 1 year ago • 5 comments

Description

Providing the import and export knowledge bases feature to enable user can reused the chunks / knowledge graph across multiple Autoflow instances, avoiding the repeated costs of embedding and knowledge graph extraction.

Design

What kind of files are used to transmit the knowledge base data?

export KB data to csv files?

export related uploads files into a folders named uploads

migration_kb_data
  - kb.{kb_id}.uploads.csv
  - kb.{kb_id}.documents.csv
  - kb.{kb_id}.chunks.csv
  - kb.{kb_id}.entities.csv
  - kb.{kb_id}.relationships.csv
  - uploads
    - xxxx.md
    - xxxx.pdf

Consideration

  • Whether to support import to the existing knowledge base
  • the upload / document / user id may be changed.

TODO

  • [ ] Support export and import knowledge base via CLI

sykp241095 avatar Dec 10 '24 10:12 sykp241095

Related https://github.com/pingcap/autoflow/issues/398

634750802 avatar Dec 10 '24 10:12 634750802

Do we really have such a scenario?

Mini256 avatar Jan 03 '25 07:01 Mini256

Do we really have such a scenario?

Yes, sometimes when user uses a local and private network environment, it is difficult for them to download docs.pingcap.com or other online docs. This function can help them to download an existing knowledge base and import it to their own self-hosted autoflow easily.

sykp241095 avatar Jan 03 '25 07:01 sykp241095

help them to download an existing knowledge base

What would the existing knowledge base be, a internal website or a folder containing a lot of local files? Please provide a detailed description in the issue description.

If the data source is not common, we should use custom script to implement

import it to their own self-hosted autoflow

Why not using upload local file data source? Do we have to use CLI to upload?

Mini256 avatar Jan 03 '25 08:01 Mini256

What would the existing knowledge base

For examples, TiDB knowledgebase, redis kb, mongodb kb.

Why not using upload local file data source

  • Cost: If we add tidb knowledge by crawl docs.pingcap.com, users should pay again for llm while extract knowledge graphs from about thousand of pages; if we achieve this by upload an about 100MB tidb-user-guide.pdf, it still need llm to extract the whole knowledge graph from this pdf file, it will cost about $50< cost <$100 maybe.

  • LLM Performance Users may not have smartest llm for knowledge graph extraction, for example many users use llama3.* 32B, or self-hosted model. these llm didn't have high performance for extracting and building graphs

Do we have to use CLI to upload?

The ultra solution might be a UI based export/import experience, I think.

sykp241095 avatar Jan 03 '25 08:01 sykp241095