graphrag
graphrag copied to clipboard
[Issue]: create_final_community_reports error becase of loading json error.
Is there an existing issue for this?
- [X] I have searched the existing issues
- [X] I have checked #657 to validate if my issue is covered by community support
Describe the issue
When I deployed the LLM using Ollama and the embeddings using Xinference, the process of building the index failed at the step create_final_community_reports. According to the detailed logs, it seems that the issue occurred due to a failure in parsing JSON.
Steps to reproduce
No response
GraphRAG Config Used
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ollama # ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
model: llama3:70b
model_supports_json: false # recommended if this is available for your model.
api_base: http://10.110.0.25:11434/v1
# max_tokens: 4000
# request_timeout: 180.0
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# temperature: 0 # temperature for sampling
# top_p: 1 # top-p sampling
# n: 1 # Number of completions to generate
parallelization:
stagger: 0.3
# num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: xinference
type: openai_embedding # or azure_openai_embedding
model: bce-embedding-base_v1
api_base: http://10.110.0.25:9997/v1
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
chunks:
size: 300
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file # or blob
base_dir: "cache"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
storage:
type: file # or blob
base_dir: "output/${timestamp}/artifacts"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
reporting:
type: file # or console, blob
base_dir: "output/${timestamp}/reports"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
entity_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 0
summarize_descriptions:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
# enabled: true
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_reports:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
# num_walks: 10
# walk_length: 40
# window_size: 2
# iterations: 3
# random_seed: 597832
umap:
enabled: false # if true, will generate UMAP embeddings for nodes
snapshots:
graphml: false
raw_entities: false
top_level_nodes: false
local_search:
# text_unit_prop: 0.5
# community_prop: 0.1
# conversation_history_max_turns: 5
# top_k_mapped_entities: 10
# top_k_relationships: 10
# llm_temperature: 0 # temperature for sampling
# llm_top_p: 1 # top-p sampling
# llm_n: 1 # Number of completions to generate
# max_tokens: 12000
global_search:
# llm_temperature: 0 # temperature for sampling
# llm_top_p: 1 # top-p sampling
# llm_n: 1 # Number of completions to generate
# max_tokens: 12000
# data_max_tokens: 12000
# map_max_tokens: 1000
# reduce_max_tokens: 2000
# concurrency: 32
Logs and screenshots
$ poetry run poe index --root .
Poe => python -m graphrag.index --root .
🚀 Reading settings from settings.yaml
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_base_text_units
id chunk chunk_id document_ids n_tokens
0 e4f90b970ba24f1c441698ec39252396 上海近 40 度的高温,并没有阻止人们参会的热情——相反,7 月 4 日于上海举办的 202... e4f90b970ba24f1c441698ec39252396 [55274aaec21254cba99632471c0033d7]
300
1 05c2ac7be39bcf579b489d309ba23ebb 把手、还是展台前被 AI 裹挟的普通人,又都在说同一件事——AI 应用如何落地。\n这里没有... 05c2ac7be39bcf579b489d309ba23ebb [55274aaec21254cba99632471c0033d7]
300
2 3380e68249b1a8d8752f190053897e48 。大会的议程设置往往反映了行业的普遍趋势。除了大模型之外,具身智能、机器人、芯片等领域也延续... 3380e68249b1a8d8752f190053897e48
[55274aaec21254cba99632471c0033d7] 300
3 236037d63b053cc5561d77c3cf97d89a 的投入等方面来看,我们对 AI安全的投入远远落后于对 AI 性能的投入。现在,世界上只有 1... 236037d63b053cc5561d77c3cf97d89a
[55274aaec21254cba99632471c0033d7] 300
4 89d5196a8d134db82c434ca2f8e0022d 风险,比如说 AI非常强大,而且是可以有很多方式去使用,所以颠覆现在社会结构在短时间内发生的... 89d5196a8d134db82c434ca2f8e0022d
[55274aaec21254cba99632471c0033d7] 300
5 7fcbb470dd11fba1f05cea93c0b4a025 给破坏了,这样权衡是非常困难的。正如图灵所说,这是无法预测的,预测不了机器有了足够算力之后会... 7fcbb470dd11fba1f05cea93c0b4a025
[55274aaec21254cba99632471c0033d7] 300
6 34bfc36eaafebee552620a9da71711cc 的增益,整体价值就比移动互联网要大多了。\n智能体是最看好的 AI 应用方向,搜索是智能体分... 34bfc36eaafebee552620a9da71711cc
[55274aaec21254cba99632471c0033d7] 300
7 45073e405f1ae1b24b50c25e1f6dab81 专有的问题。\n制作一个好的智能体通常并不需要编码,只要用人话把智能体的工作流说清楚,再配上... 45073e405f1ae1b24b50c25e1f6dab81
[55274aaec21254cba99632471c0033d7] 300
8 43a82f160b03cd9edb694538c7404bfc ��烈竞争的环境中,需要让业务效率比同行更高、成本比同行更低时,商业化的闭源模型是最能打的。... 43a82f160b03cd9edb694538c7404bfc
[55274aaec21254cba99632471c0033d7] 300
9 ffdd22363ae3c156e72bcfe81764f2cc 不能够让你站在巨人的肩膀上去迭代和开发。\n\n傅盛- 猎豹移动董事长兼 CEO,\n猎户星... ffdd22363ae3c156e72bcfe81764f2cc [55274aaec21254cba99632471c0033d7]
300
10 a98c26291b2387897b27b505d502fccb 工智能,我们叫智能涌现,其实对中间的原理并不是特别清楚,是一个灰盒状态。\n智能的涌现,可能... a98c26291b2387897b27b505d502fccb
[55274aaec21254cba99632471c0033d7] 300
11 7547fc209a7f053b453c16f9ce175bb0 人们发展的是什么?我们和 AI 不一样的地方在哪里呢?我有一个想法,那就是好奇心。\n我还想... 7547fc209a7f053b453c16f9ce175bb0
[55274aaec21254cba99632471c0033d7] 300
12 43cdc28708e6b851c748d5a2bebcb2f5 初,发明了人工智能这个词的十个人之一——赫本山姆(Herbert A. Simon)跟我们讲... 43cdc28708e6b851c748d5a2bebcb2f5 [55274aaec21254cba99632471c0033d7]
300
13 c0cff22a90d6fe2baad2f444c1152d19 ,我个人觉得有一点点混淆,翻译成普通人工智能会更加确切,它是一个最最基本的东西,而不是从通用... c0cff22a90d6fe2baad2f444c1152d19
[55274aaec21254cba99632471c0033d7] 300
14 d26a230cf29673de6aa58a557adbcbdc 企业也要意识到这是革命的工具,那这个变化就来了。\n\n张平安- 华为常务董事、华为云 CE... d26a230cf29673de6aa58a557adbcbdc
[55274aaec21254cba99632471c0033d7] 300
15 b1aa1aa2280607243e7d5c79fdaf6d5b 的这张图片里)蚂蚁绒毛清晰可见。\n在云端,通过云网端芯架构上的协同创新,来构建可持续发展的... b1aa1aa2280607243e7d5c79fdaf6d5b
[55274aaec21254cba99632471c0033d7] 300
16 c6b307e91937f72eb51fe1ec65d38108 杂决策难以胜任,以及对话交互不等于有效协同。\n通过专业智能体的深度连接,AI 会像互联网一... c6b307e91937f72eb51fe1ec65d38108
[55274aaec21254cba99632471c0033d7] 300
17 da8eab81e95c61cf1d5dd7137809c90d 业深度协作,需要很多的专业智能体共同参与、各司其职。蚂蚁坚持走开放道路,和行业共建专业智能体... da8eab81e95c61cf1d5dd7137809c90d
[55274aaec21254cba99632471c0033d7] 300
18 aed528e5d60172692c78a9d02ebfbdd6 ,我忽然感觉有点变化的想法。因为我的中学的退休的老师不停的在群里面问我,怎么样用人工智能去写... aed528e5d60172692c78a9d02ebfbdd6
[55274aaec21254cba99632471c0033d7] 300
19 e7d446b66b5ba7e0567f0032005cb96b 点:高质量数据、流畅的交互、可控性\n如果要推动人工智能超级时刻的到来,需要大模型可以展现出... e7d446b66b5ba7e0567f0032005cb96b
[55274aaec21254cba99632471c0033d7] 300
20 4e230251f89ebaeb16a495c1f737bc01 全自然的交互模式。\n第三,所有的生成都要可控,你不需要做得很好,但你需要知道你哪里做得不好... 4e230251f89ebaeb16a495c1f737bc01
[55274aaec21254cba99632471c0033d7] 300
21 54455c58f6a327d5c8e24d1bbdcdcda6 作用,但如果将 20% 的生成式 AI 工作负载转移到终端侧,预计到 2028 年将节省 1... 54455c58f6a327d5c8e24d1bbdcdcda6 [55274aaec21254cba99632471c0033d7]
300
22 bc6487902d0d4f6f2ed0db220e215a3a 使其体量越来越小,效率越来越高。\nIDC 预测,预计 2027 年中国新一代 AI 手机出... bc6487902d0d4f6f2ed0db220e215a3a [55274aaec21254cba99632471c0033d7]
300
23 ffd76bd503d085b6c431e04247900683 有 30%、40% 的错误率。国内的模型整体有 60% 到 70% 的错误率。\n为什么大模... ffd76bd503d085b6c431e04247900683 [55274aaec21254cba99632471c0033d7] 300
24 70927056007aaf7684bf4e3139965f64 会价值是至关重要的。\n提升模型正确率的关键路径\n比如为什么我们要做合成数据?比如为什么我... 70927056007aaf7684bf4e3139965f64
[55274aaec21254cba99632471c0033d7] 300
25 eeed1b1a6ec8ed0b9df92a77b91410bd �得大模型的价格持续走低整体来说,是一个非常正向的事。因为它本来就应该降低。同时它降低的同时... eeed1b1a6ec8ed0b9df92a77b91410bd
[55274aaec21254cba99632471c0033d7] 300
26 4bf095eb3a669178cd032d4eac534df4 什么要多模态?是因为真正的人在现实世界中解决问题的时候,他需要的、输入的信息本身就是多模态的... 4bf095eb3a669178cd032d4eac534df4
[55274aaec21254cba99632471c0033d7] 300
27 5846e8d52f729f53892990e96d654be7 供更优质的服务,大家能够用这个服务创造更大的价值,然后我们创造这一部分价值应该反向再传递回来... 5846e8d52f729f53892990e96d654be7
[55274aaec21254cba99632471c0033d7] 300
28 410250a724b1650a30b0bc8c80dc44c1 知时代的 AI,能够产生实际的效能,但是它是受限的,泛用性不够、成本太高、需要垂直化去做很多... 410250a724b1650a30b0bc8c80dc44c1
[55274aaec21254cba99632471c0033d7] 300
29 58513fa20127169dc1e0d74a91a5804e 供泛用化的能力,解决一系列的场景和应用需求,从而来解决成本和收益平衡的问题,这是它本质的特点... 58513fa20127169dc1e0d74a91a5804e
[55274aaec21254cba99632471c0033d7] 142
⠸ GraphRAG Indexer
🚀 create_base_extracted_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_base_entity_graph
level clustered_graph
0 0 <graphml xmlns="http://graphml.graphdrawing.or...
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_final_entities
id name ... text_unit_ids description_embedding
0 b45241d70f0e43fca764df95b2b81f77 "上海" ... [7547fc209a7f053b453c16f9ce175bb0, a98c26291b2... [-0.008768483996391296, 0.0427890345454216, 0....
1 4119fd06010c494caa07f439b333f4c5 "2024年世界人工智能大会暨人工智能全球治理高级别会议" ... [-0.008538035675883293, 0.01028020866215229, 0...
2 d3835bf3dda84ead99deadbeac5d0d7d "图灵奖得主" ... [-0.009510371834039688, 0.04583802819252014, 0...
3 077d2820ae1845bcbb1803379a3d1eae "科技公司一把手" ... [-0.0219394713640213, 0.04319910705089569, -0....
4 3671ea0dd4e84c1a9b02c5ab2c8f4bac "AI 应用" ... [-0.0041207666508853436, 0.03910283371806145, ...
.. ... ... ... ... ...
127 aff21f1da1654e7babdcf3fb0e4a75fc "价格" ... [5846e8d52f729f53892990e96d654be7] [0.01759442873299122, 0.009148363955318928, -0...
128 dc2cc9016e3f49dbac7232f05cce794d "机器" ... [5846e8d52f729f53892990e96d654be7] [-0.00537463091313839, -0.007706150878220797, ...
129 6ea0cef05f694dcea455478f40674e45 "实体经济" ... [410250a724b1650a30b0bc8c80dc44c1, 58513fa2012... [0.014552868902683258, 0.01658778265118599, -0...
130 7ab5d53a872f4dfc98f3d386879f3c75 "让机器思考" ... [410250a724b1650a30b0bc8c80dc44c1] [0.034711189568042755, 0.01091317180544138, -0...
131 af1d0fec22114a3398b8016f5225f9ed "新一代生成式 AI" ... [58513fa20127169dc1e0d74a91a5804e] [0.016883114352822304, 0.006847569718956947, -...
[132 rows x 8 columns]
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/datashaper/engine/verbs/convert.py:72: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use
to_datetime without passing `errors` and catch exceptions explicitly instead
datetime_column = pd.to_datetime(column, errors="ignore")
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/datashaper/engine/verbs/convert.py:72: UserWarning: Could not infer format, so each element will be parsed individually,
falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
datetime_column = pd.to_datetime(column, errors="ignore")
🚀 create_final_nodes
level title type description ... entity_type top_level_node_id x y
0 0 "上海" "GEO" Here is the comprehensive summary:\n\n"上海 (Sha... ... NaN b45241d70f0e43fca764df95b2b81f77 0 0
1 0 "2024年世界人工智能大会暨人工智能全球治理高级别会议" "EVENT" "The conference is a significant event that br... ... NaN 4119fd06010c494caa07f439b333f4c5 0 0
2 0 "图灵奖得主" "PERSON" "Turing Award winners are among the attendees ... ... NaN d3835bf3dda84ead99deadbeac5d0d7d 0 0
3 0 "科技公司一把手" "PERSON" "Tech company executives are also present at t... ... NaN 077d2820ae1845bcbb1803379a3d1eae 0 0
4 0 "AI 应用" ... NaN 3671ea0dd4e84c1a9b02c5ab2c8f4bac 0 0
.. ... ... ... ... ... ... ... .. ..
127 0 "价格" "PRICE" "价格 refers to the amount of money paid for a g... ... NaN aff21f1da1654e7babdcf3fb0e4a75fc 0 0
128 0 "机器" "MACHINE" "机器 refers to devices that can perform tasks, ... ... NaN dc2cc9016e3f49dbac7232f05cce794d 0 0
129 0 "实体经济" "ORGANIZATION" Here is a comprehensive summary of the data:\n... ... NaN 6ea0cef05f694dcea455478f40674e45 0 0
130 0 "让机器思考" ... NaN 7ab5d53a872f4dfc98f3d386879f3c75 0 0
131 0 "新一代生成式 AI" "TECHNOLOGY" "New generation generative AI is a technology ... ... NaN af1d0fec22114a3398b8016f5225f9ed 0 0
[132 rows x 15 columns]
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_final_communities
id title level raw_community relationship_ids text_unit_ids
0 0 Community 0 0 0 [a671bf7fea2f4514b6e96ba99127fafd, 525f41ea202... [05c2ac7be39bcf579b489d309ba23ebb,236037d63b05...
1 1 Community 1 0 1 [7ce637e4f35b42e3a9f8272cab69cd22, 4d999d7744b... [05c2ac7be39bcf579b489d309ba23ebb,410250a724b1...
2 2 Community 2 0 2 [3deb220d31f74103aa44870a36a63220, 525f41ea202... [7fcbb470dd11fba1f05cea93c0b4a025,89d5196a8d13...
3 4 Community 4 0 4 [af7a1584dd15492cb9a4940e285f57fc, 6e8d9029ce4... [34bfc36eaafebee552620a9da71711cc,45073e405f1a...
4 3 Community 3 0 3 [6731a665561840c2898ce8c9788e4c88, 4026806fa92...
🚀 join_text_units_to_entity_ids
text_unit_ids entity_ids id
0 7547fc209a7f053b453c16f9ce175bb0 [b45241d70f0e43fca764df95b2b81f77, d54956b79dd... 7547fc209a7f053b453c16f9ce175bb0
1 a98c26291b2387897b27b505d502fccb [b45241d70f0e43fca764df95b2b81f77, ef32c4b208d... a98c26291b2387897b27b505d502fccb
2 e4f90b970ba24f1c441698ec39252396 [b45241d70f0e43fca764df95b2b81f77, 4119fd06010... e4f90b970ba24f1c441698ec39252396
3 05c2ac7be39bcf579b489d309ba23ebb [19a7f254a5d64566ab5cc15472df02de, e7ffaee9d31... 05c2ac7be39bcf579b489d309ba23ebb
4 236037d63b053cc5561d77c3cf97d89a [19a7f254a5d64566ab5cc15472df02de, 254770028d7... 236037d63b053cc5561d77c3cf97d89a
5 34bfc36eaafebee552620a9da71711cc [19a7f254a5d64566ab5cc15472df02de, dde131ab575... 34bfc36eaafebee552620a9da71711cc
6 410250a724b1650a30b0bc8c80dc44c1 [19a7f254a5d64566ab5cc15472df02de, e1fd0e904a5... 410250a724b1650a30b0bc8c80dc44c1
7 4e230251f89ebaeb16a495c1f737bc01 [19a7f254a5d64566ab5cc15472df02de, 7e2c84548fb... 4e230251f89ebaeb16a495c1f737bc01
8 5846e8d52f729f53892990e96d654be7 [19a7f254a5d64566ab5cc15472df02de, e1fd0e904a5... 5846e8d52f729f53892990e96d654be7
9 89d5196a8d134db82c434ca2f8e0022d [19a7f254a5d64566ab5cc15472df02de, 68105770b52... 89d5196a8d134db82c434ca2f8e0022d
10 b1aa1aa2280607243e7d5c79fdaf6d5b [19a7f254a5d64566ab5cc15472df02de, c03ab3ce8cb... b1aa1aa2280607243e7d5c79fdaf6d5b
11 c6b307e91937f72eb51fe1ec65d38108 [19a7f254a5d64566ab5cc15472df02de, c6d1e4f56c2... c6b307e91937f72eb51fe1ec65d38108
12 ffd76bd503d085b6c431e04247900683 [19a7f254a5d64566ab5cc15472df02de, 7f65feab754... ffd76bd503d085b6c431e04247900683
13 70927056007aaf7684bf4e3139965f64 [e7ffaee9d31d4d3c96e04f911d0a8f9e, fd9cb733b28... 70927056007aaf7684bf4e3139965f64
14 58513fa20127169dc1e0d74a91a5804e [e1fd0e904a53409aada44442f23a51cb, 6ea0cef05f6... 58513fa20127169dc1e0d74a91a5804e
15 aed528e5d60172692c78a9d02ebfbdd6 [e1fd0e904a53409aada44442f23a51cb, 32e6ccab20d... aed528e5d60172692c78a9d02ebfbdd6
16 e7d446b66b5ba7e0567f0032005cb96b [e1fd0e904a53409aada44442f23a51cb, 7cc3356d38d... e7d446b66b5ba7e0567f0032005cb96b
17 3380e68249b1a8d8752f190053897e48 [bc0e3f075a4c4ebbb7c7b152b65a5625, 254770028d7... 3380e68249b1a8d8752f190053897e48
18 7fcbb470dd11fba1f05cea93c0b4a025 [68105770b523412388424d984e711917, 85c79fd84f5... 7fcbb470dd11fba1f05cea93c0b4a025
19 45073e405f1ae1b24b50c25e1f6dab81 [dde131ab575d44dbb55289a6972be18f, 32ee140946e... 45073e405f1ae1b24b50c25e1f6dab81
20 43a82f160b03cd9edb694538c7404bfc [de6fa24480894518ab3cbcb66f739266, 6fae5ee1a83... 43a82f160b03cd9edb694538c7404bfc
21 ffdd22363ae3c156e72bcfe81764f2cc [de6fa24480894518ab3cbcb66f739266, 6fae5ee1a83... ffdd22363ae3c156e72bcfe81764f2cc
22 43cdc28708e6b851c748d5a2bebcb2f5 [94a964c6992945ebb3833dfdfdc8d655, 1eb829d0ace... 43cdc28708e6b851c748d5a2bebcb2f5
23 c0cff22a90d6fe2baad2f444c1152d19 [26f88ab3e2e04c33a459ad6270ade565, babe97e1d97... c0cff22a90d6fe2baad2f444c1152d19
24 d26a230cf29673de6aa58a557adbcbdc [26f88ab3e2e04c33a459ad6270ade565, c9b8ce91fc2... d26a230cf29673de6aa58a557adbcbdc
25 bc6487902d0d4f6f2ed0db220e215a3a [1033a18c45aa4584b2aef6ab96890351, 91ff849d12b... bc6487902d0d4f6f2ed0db220e215a3a
26 54455c58f6a327d5c8e24d1bbdcdcda6 [53af055f068244d0ac861b2e89376495, 7ea4afbf8a2... 54455c58f6a327d5c8e24d1bbdcdcda6
27 da8eab81e95c61cf1d5dd7137809c90d [c6d1e4f56c2843e89cf0b91c10bb6de2, 0adb2d9941f... da8eab81e95c61cf1d5dd7137809c90d
28 eeed1b1a6ec8ed0b9df92a77b91410bd [de837ff3d626451282ff6ac77a82216d, 460295fed3a... eeed1b1a6ec8ed0b9df92a77b91410bd
29 4bf095eb3a669178cd032d4eac534df4 [6ea81acaf232485e94fff638e03336e1, d136b08d586... 4bf095eb3a669178cd032d4eac534df4
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version.
Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/site-packages/datashaper/engine/verbs/convert.py:65: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use
to_numeric without passing `errors` and catch exceptions explicitly instead
column_numeric = cast(pd.Series, pd.to_numeric(column, errors="ignore"))
🚀 create_final_relationships
source target weight description ... human_readable_id source_degree target_degree rank
0 "上海" "2024年世界人工智能大会暨人工智能全球治理高级别会议" 1.0 "The conference was held in Shanghai, attracti... ... 0 2 1 3
1 "上海" "爱乐乐团" 2.0 Here is the comprehensive summary:\n\n"\u7231\... ... 1 2 1 3
2 "图灵奖得主" "AI 应用" 1.0 "Turing Award winners are discussing the pract... ... 2 1 2 3
3 "科技公司一把手" "AI 应用" 1.0 "Tech company executives are sharing their exp... ... 3 1 2 3
4 "AI" "AI 行业大咖" 1.0 "AI industry leaders are discussing the develo... ... 4 10 1 11
.. ... ... ... ... ... ... ... ... ...
65 "模型" "大模型企业" 1.0 "大模型企业 is related to the 模型, potentially using... ... 65 2 2 4
66 "大模型企业" "价格降低" 1.0 "The 事件 of prices decreasing benefits 大模型企业 by... ... 66 2 1 3
67 "LARGE MODELS" "智譜 AI" 1.0 "智譜 AI is a company focused on developing and ... ... 67 1 1 2
68 "技术" "降价" 1.0 "The event of 降价 is driven by the advancement ... ... 68 1 1 2
69 "实体经济" "让机器思考" 1.0 "实体经济 is influenced by the direction of develo... ... 69 2 1 3
[70 rows x 10 columns]
🚀 join_text_units_to_relationship_ids
id relationship_ids
0 e4f90b970ba24f1c441698ec39252396 [b07a7f088364459098cd8511ff27a4c8, cd130938a28...
1 7547fc209a7f053b453c16f9ce175bb0 [8870cf2b5df64d2cab5820f67e29b9f1, b1f6164116d...
2 a98c26291b2387897b27b505d502fccb [8870cf2b5df64d2cab5820f67e29b9f1, 896d2a51e8d...
3 05c2ac7be39bcf579b489d309ba23ebb
4 89d5196a8d134db82c434ca2f8e0022d [525f41ea20274a05af4e52b625b473f3, 071a416efbe...
5 34bfc36eaafebee552620a9da71711cc [6d8473ef3b1042bf87178a611e3dbcc6, af7a1584dd1...
6 b1aa1aa2280607243e7d5c79fdaf6d5b [30c9641543c24773938bd8ec57ea98ab, 6731a665561...
7 c6b307e91937f72eb51fe1ec65d38108 [18b839da898e4026b81727d759d95c6a, 68e0c60d2e8...
8 4e230251f89ebaeb16a495c1f737bc01 [eeef6ae5c464400c8755900b4f1ac37a, 422433aa458...
9 410250a724b1650a30b0bc8c80dc44c1 [86505bca739d4bccaaa1a8e0f3baffdc, 9a6f414210e...
10 5846e8d52f729f53892990e96d654be7 [86505bca739d4bccaaa1a8e0f3baffdc, 1af9faf341e...
11 70927056007aaf7684bf4e3139965f64 [353d91abc68648639d65a549e59b5cf3, 4465efb7f6e...
12 aed528e5d60172692c78a9d02ebfbdd6 [7ce637e4f35b42e3a9f8272cab69cd22, 735d19aea07...
13 e7d446b66b5ba7e0567f0032005cb96b [4d999d7744b04a998475f8f8531589f0, a0047221896...
14 3380e68249b1a8d8752f190053897e48 [db541b7260974db8bac94e953009f60e, f2ff8044718...
15 236037d63b053cc5561d77c3cf97d89a [87915637da3e474c9349bd0ae604bd95, 8f1eba29f39...
16 7fcbb470dd11fba1f05cea93c0b4a025 [3deb220d31f74103aa44870a36a63220]
17 45073e405f1ae1b24b50c25e1f6dab81 [6e8d9029ce4e4ea182367173ab2c7bbf, 0e8d921ccd8...
18 43a82f160b03cd9edb694538c7404bfc [4f2c665decf242b0bfcaf7350b0e02ed, 66cdf168f36...
19 ffdd22363ae3c156e72bcfe81764f2cc [4f2c665decf242b0bfcaf7350b0e02ed, 66cdf168f36...
20 43cdc28708e6b851c748d5a2bebcb2f5 [5a28b94bc63b44edb30c54748fd14f15, f97011b2a99...
21 c0cff22a90d6fe2baad2f444c1152d19 [35489ca6a63b47d6a8913cf333818bc1, 6fb57f83bae...
22 d26a230cf29673de6aa58a557adbcbdc [5d3344f45e654d2c808481672f2f08dd, 70634e10a5e...
23 54455c58f6a327d5c8e24d1bbdcdcda6 [d203efdbfb2f4b2a899abfb31cf72e82, 31a7e680c4d...
24 da8eab81e95c61cf1d5dd7137809c90d [68e0c60d2e8845d89d9d0ad397833648, 60c58026b27...
25 bc6487902d0d4f6f2ed0db220e215a3a [351abba16e5c448994c6daf48121b14d, 50ea7d3b696...
26 eeed1b1a6ec8ed0b9df92a77b91410bd
27 4bf095eb3a669178cd032d4eac534df4 [5dabc4cd05da425cb194a04482bf0c29]
❌ create_final_community_reports
None
⠸ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports
❌ Errors occurred during the pipeline run, see logs for more details.
...
And the detail log show me:
13:23:36,521 graphrag.index.run INFO dependencies for create_final_community_reports: ['create_final_nodes', 'create_final_relationships']
13:23:36,522 graphrag.index.run INFO read table from storage: create_final_nodes.parquet
13:23:36,527 graphrag.index.run INFO read table from storage: create_final_relationships.parquet
13:23:36,570 datashaper.workflow.workflow INFO executing verb prepare_community_reports_nodes
13:23:36,589 datashaper.workflow.workflow INFO executing verb prepare_community_reports_edges
13:23:36,607 datashaper.workflow.workflow INFO executing verb restore_community_hierarchy
13:23:36,628 datashaper.workflow.workflow INFO executing verb prepare_community_reports
13:23:36,628 graphrag.index.verbs.graph.report.prepare_community_reports INFO Number of nodes at level=0 => 132
13:23:36,668 datashaper.workflow.workflow INFO executing verb create_community_reports
13:23:58,336 httpx INFO HTTP Request: POST http://10.110.0.25:11434/v1/chat/completions "HTTP/1.1 200 OK"
13:23:58,339 graphrag.llm.openai.utils ERROR error loading json, json=Here is the output in JSON format:```{ "title": "Baidu Community", "summary": "The Baidu community revolves around Robin Li, the founder and CEO of Baidu, a Chinese technology company focused on AI applications and research. The community's dynamics are shaped by Robin Li's concerns about AI development risks and his leadership role in Baidu.", "rating": 6.0, "rating_explanation": "The impact severity rating is moderate due to the potential influence of Robin Li's concerns about AI development risks on the tech industry.", "findings": [ { "summary": "Robin Li's leadership role in Baidu", "explanation": "Robin Li is the founder, chairman, and CEO of Baidu, a Chinese technology company. This leadership role suggests his significant influence on the company's direction and decisions [Data: Entities (31); Relationships (24)]." }, { "summary": "Baidu's focus on AI applications and research", "explanation": "Baidu is a Chinese technology company focused on both AI applications and research. This focus suggests the company's potential impact on the tech industry [Data: Entities (32)]." }, { "summary": "Robin Li's concerns about AI development risks", "explanation": "Robin Li is concerned about the risks associated with Artificial Intelligence (AI) development. This concern could influence his leadership decisions in Baidu and shape the company's approach to AI research [Data: Entities (31); Relationships (5)]." } ]}```Let me know if you need any further assistance!
Traceback (most recent call last):
File "/data/galen_guo/workspace/LLM-Research/graphrag/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object
result = json.loads(input)
^^^^^^^^^^^^^^^^^
File "/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/galen.guo/miniforge3/envs/rag_env/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
13:23:58,340 graphrag.llm.openai.openai_chat_llm WARNING error parsing llm json, retrying
### Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues: