Slow
Search before asking
- [x] I had searched in the issues and found no similar issues.
Operating system information
Linux
What happened
Been running this on a 300 page document for over 3 days
chunk size 2000 using ollama locally (gemma 2)
I definitely can't use this for anything serious running off of local llm's.
I got past the splitter and into the extractor but jesus h christ, and I have gpu support
I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested.
How to reproduce
drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200)
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
been stuck on extractor for a whole day
Reader
2025-02-21 22:32:37(172.23.0.5): Task scheduling completed. cost:705 ms !
2025-02-21 22:32:37(172.23.0.5): Lock released successfully!
2025-02-21 22:32:37(172.23.0.5): Store the results of the read operator. file:builder/builder/project_1/instance_9/49_kagReaderSyncTask.kag
2025-02-21 22:3...Expand
Splitter
2025-02-21 22:33:07(172.23.0.5): Task scheduling completed. cost:6 ms !
2025-02-21 22:33:07(172.23.0.5): Lock released successfully!
2025-02-21 22:33:07(172.23.0.5): Splitter task trace log:
>> 22:32:48: Store the results of the split operator. file:builder/builder/project_1/instance_9/50_kagSplitterAsyncTask.kag
>> 22:32:48: Sp...Expand
3
Extractor
2025-02-23 23:34:07(172.23.0.5): Task scheduling completed. cost:3 ms !
2025-02-23 23:34:07(172.23.0.5): Lock released successfully!
2025-02-23 23:34:07(172.23.0.5): Extractor task status is RUNNING
2025-02-23 23:34:07(172.23.0.5): The asynchronous task has been created! resource:builder/project_1/instance_9/51_kagExtractorAsyncTask.kag
2025-02-23 23:34:07(172.23.0.5): Lock preempted successfully!
...
2025-02-23 19:26:07(172.23.0.5): Task scheduling completed. cost:2 ms !
2025-02-23 19:26:07(172.23.0.5): Lock released successfully!
2025-02-23 19:26:07(172.23.0.5): Extractor task status is RUNNING
2025-02-23 19:26:07(172.23.0.5): The asynchronous task has been created! resource:builder/project_1/...
4
Vectorizer
你的日志中reader部分展开之后有一个关键词trunk,确认一下是不是trunk=0;如果是,那么你可以使用官方手册提供的文件重新上传以下。
Search before asking
- [x] I had searched in the issues and found no similar issues.
Operating system information
Linux
What happened
Been running this on a 300 page document for over 3 days
chunk size 2000 using ollama locally (gemma 2)
I definitely can't use this for anything serious running off of local llm's.
I got past the splitter and into the extractor but jesus h christ, and I have gpu support
I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested.
How to reproduce
drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200)
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
Could you upload your markdown document to help us to reproduce your work?
I'll try another document, particularly an arxiv paper and see how long that takes and send that doc (if it takes longer than a night). I noticed extensive time with any document. I don't want to be sending scraped books I dont have permission to.
On Mon, Feb 24, 2025, 3:53 AM 田常@蚂蚁 @.***> wrote:
Search before asking
- I had searched in the issues https://github.com/OpenSPG/KAG/issues?q=is%3Aissue and found no similar issues.
Operating system information
Linux What happened
Been running this on a 300 page document for over 3 days
chunk size 2000 using ollama locally (gemma 2)
I definitely can't use this for anything serious running off of local llm's.
I got past the splitter and into the extractor but jesus h christ, and I have gpu support
I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested. How to reproduce
drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200) Are you willing to submit PR?
- Yes I am willing to submit a PR!
Could you upload your markdown document to help us to reproduce your work?
— Reply to this email directly, view it on GitHub https://github.com/OpenSPG/KAG/issues/367#issuecomment-2678194183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOUOEZLETAYMVWO6MCD2RMB4LAVCNFSM6AAAAABXWNDSE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZYGE4TIMJYGM . You are receiving this because you authored the thread.Message ID: @.***> [image: caszkgui]caszkgui left a comment (OpenSPG/KAG#367) https://github.com/OpenSPG/KAG/issues/367#issuecomment-2678194183
Search before asking
- I had searched in the issues https://github.com/OpenSPG/KAG/issues?q=is%3Aissue and found no similar issues.
Operating system information
Linux What happened
Been running this on a 300 page document for over 3 days
chunk size 2000 using ollama locally (gemma 2)
I definitely can't use this for anything serious running off of local llm's.
I got past the splitter and into the extractor but jesus h christ, and I have gpu support
I have custom graph algorithm I could contribute that wouldn't leverage an llm but uses nlp/spacy/pos tagging that might help speed things up with graph generation if you are interested. How to reproduce
drag a 300 page markdown document in leave everything at default (2000 split chars) setup with ollama using a local model (I'm running gemma 2 and all-minilm off of ollama using a p5200) Are you willing to submit PR?
- Yes I am willing to submit a PR!
Could you upload your markdown document to help us to reproduce your work?
— Reply to this email directly, view it on GitHub https://github.com/OpenSPG/KAG/issues/367#issuecomment-2678194183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOUOEZLETAYMVWO6MCD2RMB4LAVCNFSM6AAAAABXWNDSE2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZYGE4TIMJYGM . You are receiving this because you authored the thread.Message ID: @.***>
Model Config
Step 1
Step 2
Step 3
prompt word { "biz_scene":"default", "language":"en" }
idk I give up
nm,I got this one started at least set to 858 chunk size
I'll check on it when I get back later tonight, and again tomorrow morning.
well looks like the arxiv paper finished overnight...
hrmm...
我的情况也一样,非常慢,查了read部分的chunksize 不是0 txt文本,大约2M 跑了24小时了
KAG V0.8 improved Knowledge extraction efficiency, you can try out the latest version:
First, we have upgraded the capabilities of the KAG knowledge base. We have expanded support for two modes: private domain knowledge bases (including structured and unstructured data) and public domain knowledge bases. This includes the ability to integrate public web data sources such as LBS and WebSearch via the MCP protocol. Additionally, we have improved the management of private domain knowledge base indexing, incorporating multiple foundational index types such as Outline, Summary, KnowledgeUnit, AtomicQuery, Chunk, and Table. This supports developers in customizing indexes and synchronizing them with product interfaces. Users can select the most appropriate index type based on their specific scenarios, achieving a balance between construction costs and business outcomes.
Ooh I need to try this
On Sat, Aug 16, 2025, 2:18 AM 田常@蚂蚁 @.***> wrote:
Closed #367 https://github.com/OpenSPG/KAG/issues/367 as completed.
— Reply to this email directly, view it on GitHub https://github.com/OpenSPG/KAG/issues/367#event-19185605376, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHKKOQF3ZFTZUCCLPM6UKL3N3SOPAVCNFSM6AAAAABXWNDSE2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJZGE4DKNRQGUZTONQ . You are receiving this because you authored the thread.Message ID: @.***>