KAG icon indicating copy to clipboard operation
KAG copied to clipboard

Slow

Open thistleknot opened this issue 10 months ago • 8 comments

Search before asking

  • [x] I had searched in the issues and found no similar issues.

Operating system information

Linux

What happened

Been running this on a 300 page document for over 3 days

chunk size 2000 using ollama locally (gemma 2)

I definitely can't use this for anything serious running off of local llm's.

I got past the splitter and into the extractor but jesus h christ, and I have gpu support

I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested.

How to reproduce

drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200)

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

thistleknot avatar Feb 23 '25 15:02 thistleknot

been stuck on extractor for a whole day

Reader

2025-02-21 22:32:37(172.23.0.5): Task scheduling completed. cost:705 ms !
2025-02-21 22:32:37(172.23.0.5): Lock released successfully!
2025-02-21 22:32:37(172.23.0.5): Store the results of the read operator. file:builder/builder/project_1/instance_9/49_kagReaderSyncTask.kag
2025-02-21 22:3...Expand
Splitter

2025-02-21 22:33:07(172.23.0.5): Task scheduling completed. cost:6 ms !
2025-02-21 22:33:07(172.23.0.5): Lock released successfully!
2025-02-21 22:33:07(172.23.0.5): Splitter task trace log:
    >> 22:32:48: Store the results of the split operator. file:builder/builder/project_1/instance_9/50_kagSplitterAsyncTask.kag
    >> 22:32:48: Sp...Expand
3
Extractor

2025-02-23 23:34:07(172.23.0.5): Task scheduling completed. cost:3 ms !
2025-02-23 23:34:07(172.23.0.5): Lock released successfully!
2025-02-23 23:34:07(172.23.0.5): Extractor task status is RUNNING
2025-02-23 23:34:07(172.23.0.5): The asynchronous task has been created! resource:builder/project_1/instance_9/51_kagExtractorAsyncTask.kag
2025-02-23 23:34:07(172.23.0.5): Lock preempted successfully!

...

2025-02-23 19:26:07(172.23.0.5): Task scheduling completed. cost:2 ms !
2025-02-23 19:26:07(172.23.0.5): Lock released successfully!
2025-02-23 19:26:07(172.23.0.5): Extractor task status is RUNNING
2025-02-23 19:26:07(172.23.0.5): The asynchronous task has been created! resource:builder/project_1/...
4
Vectorizer

thistleknot avatar Feb 23 '25 15:02 thistleknot

你的日志中reader部分展开之后有一个关键词trunk,确认一下是不是trunk=0;如果是,那么你可以使用官方手册提供的文件重新上传以下。

xiaozhou123-oos avatar Feb 24 '25 06:02 xiaozhou123-oos

Search before asking

  • [x] I had searched in the issues and found no similar issues.

Operating system information

Linux

What happened

Been running this on a 300 page document for over 3 days

chunk size 2000 using ollama locally (gemma 2)

I definitely can't use this for anything serious running off of local llm's.

I got past the splitter and into the extractor but jesus h christ, and I have gpu support

I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested.

How to reproduce

drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200)

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

Could you upload your markdown document to help us to reproduce your work?

caszkgui avatar Feb 24 '25 11:02 caszkgui

I'll try another document, particularly an arxiv paper and see how long that takes and send that doc (if it takes longer than a night). I noticed extensive time with any document. I don't want to be sending scraped books I dont have permission to.

On Mon, Feb 24, 2025, 3:53 AM 田常@蚂蚁 @.***> wrote:

Search before asking

Operating system information

Linux What happened

Been running this on a 300 page document for over 3 days

chunk size 2000 using ollama locally (gemma 2)

I definitely can't use this for anything serious running off of local llm's.

I got past the splitter and into the extractor but jesus h christ, and I have gpu support

I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested. How to reproduce

drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200) Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Could you upload your markdown document to help us to reproduce your work?

— Reply to this email directly, view it on GitHub https://github.com/OpenSPG/KAG/issues/367#issuecomment-2678194183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOUOEZLETAYMVWO6MCD2RMB4LAVCNFSM6AAAAABXWNDSE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZYGE4TIMJYGM . You are receiving this because you authored the thread.Message ID: @.***> [image: caszkgui]caszkgui left a comment (OpenSPG/KAG#367) https://github.com/OpenSPG/KAG/issues/367#issuecomment-2678194183

Search before asking

Operating system information

Linux What happened

Been running this on a 300 page document for over 3 days

chunk size 2000 using ollama locally (gemma 2)

I definitely can't use this for anything serious running off of local llm's.

I got past the splitter and into the extractor but jesus h christ, and I have gpu support

I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested. How to reproduce

drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200) Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Could you upload your markdown document to help us to reproduce your work?

— Reply to this email directly, view it on GitHub https://github.com/OpenSPG/KAG/issues/367#issuecomment-2678194183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOUOEZLETAYMVWO6MCD2RMB4LAVCNFSM6AAAAABXWNDSE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZYGE4TIMJYGM . You are receiving this because you authored the thread.Message ID: @.***>

thistleknot avatar Feb 24 '25 15:02 thistleknot

Model Config Image

Step 1 Image

Step 2 Image

Step 3 Image

prompt word { "biz_scene":"default", "language":"en" }

Image

idk I give up

thistleknot avatar Feb 25 '25 00:02 thistleknot

nm,I got this one started at least set to 858 chunk size

Image

I'll check on it when I get back later tonight, and again tomorrow morning.

thistleknot avatar Feb 25 '25 00:02 thistleknot

well looks like the arxiv paper finished overnight...

hrmm...

thistleknot avatar Feb 25 '25 14:02 thistleknot

我的情况也一样,非常慢,查了read部分的chunksize 不是0 txt文本,大约2M 跑了24小时了

xxyyboy avatar Apr 23 '25 11:04 xxyyboy

KAG V0.8 improved Knowledge extraction efficiency, you can try out the latest version:

First, we have upgraded the capabilities of the KAG knowledge base. We have expanded support for two modes: private domain knowledge bases (including structured and unstructured data) and public domain knowledge bases. This includes the ability to integrate public web data sources such as LBS and WebSearch via the MCP protocol. Additionally, we have improved the management of private domain knowledge base indexing, incorporating multiple foundational index types such as Outline, Summary, KnowledgeUnit, AtomicQuery, Chunk, and Table. This supports developers in customizing indexes and synchronizing them with product interfaces. Users can select the most appropriate index type based on their specific scenarios, achieving a balance between construction costs and business outcomes.

caszkgui avatar Aug 16 '25 08:08 caszkgui

Ooh I need to try this

On Sat, Aug 16, 2025, 2:18 AM 田常@蚂蚁 @.***> wrote:

Closed #367 https://github.com/OpenSPG/KAG/issues/367 as completed.

— Reply to this email directly, view it on GitHub https://github.com/OpenSPG/KAG/issues/367#event-19185605376, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOQF3ZFTZUCCLPM6UKL3N3SOPAVCNFSM6AAAAABXWNDSE2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJZGE4DKNRQGUZTONQ . You are receiving this because you authored the thread.Message ID: @.***>

thistleknot avatar Aug 16 '25 17:08 thistleknot