[Bug]: I'm getting an error with the create_final_community_reports step on Chinese text
Describe the bug
python -m graphrag.index --root ./microsoft_graphrag ERROR: ❌ create_final_community_reports None ⠋ GraphRAG Indexer ├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:… ├── create_base_text_units ├── create_base_extracted_entities ├── create_summarized_entities ├── create_base_entity_graph ├── create_final_entities ├── create_final_nodes ├── create_final_communities ├── join_text_units_to_entity_ids ├── create_final_relationships ├── join_text_units_to_relationship_ids └── create_final_community_reports❌ Errors occurred during the pipeline run, see logs for more details.
Steps to reproduce
TEST DATA - news.txt
来源: 新浪财经
新闻发布时间: 2024-07-16 23:59:42
详情: 亚特兰大联储GDPNow模型预计美国第二季度GDP增速为2.5%,此前预计为2.0%。
短讯类型: 数据,观点
来源: 华尔街见闻
新闻发布时间: 2024-07-16 23:59:03
标题: 王毅同匈牙利外长西雅尔多通电话
详情: 7月16日,中共中央政治局委员、外交部长王毅应约同匈牙利外长西雅尔多通电话。西雅尔多介绍了对当前局势特别是乌克兰危机的看法以及匈方近期所作努力,表示中国是支持促进和平的重要力量,匈方愿同中方携手,防止冲突扩大升级,积累政治解决的条件。王毅说,欧尔班总理日前专程来华,彰显了两国领导人的互信和友谊,也体现了中匈新时代全天候全面战略伙伴关系的高水平。匈牙利为寻求和平奔走斡旋,发挥了建设性作用。中方认为,当前最紧迫的事项、也是最现实的目标就是推动乌克兰局势尽快降温。各方尽快就“战场不外溢、战事不升级、各方不拱火”达成共识,进而为实现停火、恢复和谈创造条件。(新华社)
短讯类型: 要闻,A股,外汇
是否重要: 0.0
来源: 凤凰
新闻发布时间: 2024-07-16 23:58:54
详情: 俄罗斯计划针对欧佩克+协议实施补偿性石油减产。
来源: 新浪财经
新闻发布时间: 2024-07-16 23:58:43
详情: 俄罗斯计划针对OPEC+协议实施补偿性石油减产。
关联股票: 石油,石油,石油,石油,石油,石油
短讯类型: 国际
来源: 凤凰
新闻发布时间: 2024-07-16 23:58:37
详情: 【美国伊利诺伊州一大坝发生溃坝 当地居民开始疏散】当地时间7月16日上午,美国伊利诺伊州华盛顿县应急管理局宣布,受强降雨影响,该州纳什维尔大坝发生溃坝。当地官员已发出疏散警告,当地居民已开始疏散。
来源: 华尔街见闻
新闻发布时间: 2024-07-16 23:58:23
标题: 报道:俄罗斯计划额外减产石油
详情: 据知情人士透露, 俄罗斯计划在2024-25年的暖和季节针对OPEC+协议实施补偿性石油减产。由于技术性缘故,俄罗斯今明两年夏季和秋季早期的石油减产幅度极可能会超过OPEC+石油减产协议规定的水平。在寒冷季节,产油国俄罗斯国内的消费需求也会增长。(彭博)
短讯类型: 要闻,外汇,黄金,石油,美股,港股
是否重要: 0.0
来源: 新浪财经
新闻发布时间: 2024-07-16 23:57:45
详情: 【美国伊利诺伊州一大坝发生溃坝 当地居民开始疏散】当地时间7月16日上午,美国伊利诺伊州华盛顿县应急管理局宣布,受强降雨影响,该州纳什维尔大坝发生溃坝。当地官员已发出疏散警告,当地居民已开始疏散。(央视新闻)
短讯类型: 国际
来源: 同花顺
新闻发布时间: 2024-07-16 23:57:14
标题: 王毅同匈牙利外长西雅尔多通电话
详情: 7月16日,中共中央政治局委员、外交部长王毅应约同匈牙利外长西雅尔多通电话。
西雅尔多介绍了对当前局势特别是乌克兰危机的看法以及匈方近期所作努力,表示中国是支持促进和平的重要力量,匈方愿同中方携手,防止冲突扩大升级,积累政治解决的条件。
王毅说,欧尔班总理日前专程来华,习近平主席同他就事关和平的重要议题进行战略沟通,彰显了两国领导人的互信和友谊,也体现了中匈新时代全天候全面战略伙伴关系的高水平。匈牙利为寻求和平奔走斡旋,发挥了建设性作用。中方认为,当前最紧迫的事项、也是最现实的目标就是推动乌克兰局势尽快降温。各方尽快就“战场不外溢、战事不升级、各方不拱火”达成共识,进而为实现停火、恢复和谈创造条件。中方愿同匈方一道,汇聚更多支持和平的力量,发出更多理性的声音,推动局势朝着政治解决的方向发展。(新华社)
短讯类型: 7*24小时全球直播
是否重要: 0.0
来源: 华尔街见闻
新闻发布时间: 2024-07-16 23:56:08
标题: 科大讯飞:今年1-5月AI学习机销量增长超过100%
详情: 科大讯飞披露投资者关系活动记录表显示,科大讯飞AI学习机今年1-5月份销量增长超过100%,用户净推荐值持续保持行业第一。科大讯飞2023年率先将大模型技术落地到学习机上,推出了九大AI 1对1辅导功能。此外,讯飞星火APP自去年9月正式全民开放后,目前在安卓端统计到已经累计下载了1.31亿次。在医疗领域,讯飞晓医APP可以诊断1600种常见疾病、2000多种症状,能识别有2800多种常见药品,也可以理解26万个药品相互作用。近日,科大讯飞与交通银行、人保集团、招商银行、国元证券等100多家金融机构展开了更为紧密的合作交流。
短讯类型: 要闻,A股,
是否重要: 0.0
来源: 新浪财经
新闻发布时间: 2024-07-16 23:55:46
详情: 【王毅同匈牙利外长西雅尔多通电话】7月16日,中共中央政治局委员、外交部长王毅应约同匈牙利外长西雅尔多通电话。
西雅尔多介绍了对当前局势特别是乌克兰危机的看法以及匈方近期所作努力,表示中国是支持促进和平的重要力量,匈方愿同中方携手,防止冲突扩大升级,积累政治解决的条件。
王毅说,欧尔班总理日前专程来华,习近平主席同他就事关和平的重要议题进行战略沟通,彰显了两国领导人的互信和友谊,也体现了中匈新时代全天候全面战略伙伴关系的高水平。匈牙利为寻求和平奔走斡旋,发挥了建设性作用。中方认为,当前最紧迫的事项、也是最现实的目标就是推动乌克兰局势尽快降温。各方尽快就“战场不外溢、战事不升级、各方不拱火”达成共识,进而为实现停火、恢复和谈创造条件。中方愿同匈方一道,汇聚更多支持和平的力量,发出更多理性的声音,推动局势朝着政治解决的方向发展。(新华社)
短讯类型: 国际
Expected Behavior
No response
GraphRAG Config Used
No response
Logs and screenshots
D:\workspace\datalab\jsrag\venv\Scripts\python.exe D:\workspace\datalab\jsrag\tests\microsoft_graphrag\test_index.py 🚀 Reading settings from settings.yaml D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead. return bound(*args, **kwds) 🚀 create_base_text_units id ... n_tokens 0 663affd4de3a71cf9b612eae20d761d3 ... 300 1 52bd1651a5d8d56d81c1801482965a6d ... 300 2 4d0bb660afb52d68caab05f6bf0b4350 ... 300 3 7f6fd5708e48481bb673873f9ed15c30 ... 300 4 8204398b499f96829990a722591a9b83 ... 300 5 71f0b35fc170eb480295254536496a1d ... 300 6 04eeac4ad722a860496c0937e3eb1856 ... 300 7 a0f38ad84274b55756cc22177409ff46 ... 300 8 4f58b1869ff3695c8bd4e994ef8c84de ... 300 9 f170ecad55e15cfe417d0302b691ca4b ... 300 10 60d03a6c901c38de26ffd7df96aa5d18 ... 300 11 a1215748205916f7b5e0adccc9c22795 ... 300 12 b1fd07ccd1a54a8ed9b309f2d01607a9 ... 285 13 ca60e805b3f302a56091e5f3c8db2ab8 ... 85
[14 rows x 5 columns]
🚀 create_base_extracted_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_base_entity_graph
level clustered_graph
0 0 <graphml xmlns="http://graphml.graphdrawing.or...
D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_final_entities
id ...
description_embedding
0 b45241d70f0e43fca764df95b2b81f77 ... [-0.0014526282, -0.022810562,
-0.012244933, 0....
1 4119fd06010c494caa07f439b333f4c5 ... [-0.031926226, -0.02635819,
0.05546864, 0.0466...
2 d3835bf3dda84ead99deadbeac5d0d7d ... [-0.012088798, -0.01461496,
0.028455619, 0.096...
3 077d2820ae1845bcbb1803379a3d1eae ... [-0.016000712, -0.045031797,
-0.019178852, 0.0...
4 3671ea0dd4e84c1a9b02c5ab2c8f4bac ... [-0.047121223, -0.003207534,
-0.042235475, 0.0...
5 19a7f254a5d64566ab5cc15472df02de ... [-0.09580274, -0.018634425,
-0.015189313, 0.03...
6 e7ffaee9d31d4d3c96e04f911d0a8f9e ... [-0.068246044, -0.032368124,
-0.013941692, 0.0...
7 f7e11b0e297a44a896dc67928368f600 ... [-0.08585356, 0.008408449,
0.011477692, 0.0618...
8 1fd3fa8bb5a2408790042ab9573779ee ... [0.028590905, -0.035342693,
0.027007151, 0.101...
9 27f9fbe6ad8c4a8b9acee0d3596ed57c ... [-0.025687896, -0.03244348,
0.0051344517, 0.07...
10 e1fd0e904a53409aada44442f23a51cb ... [-0.050529823, -0.0054584057,
-0.025121942, 0....
11 de988724cfdf45cebfba3b13c43ceede ... [-0.07442019, 0.016910737,
-0.014245165, 0.060...
12 96aad7cb4b7d40e9b7e13b94a67af206 ... [-0.04028788, -0.0081702685,
-0.033586647, 0.0...
13 c9632a35146940c2a86167c7726d35e9 ... [-0.04757148, -0.01928787,
-0.011983054, 0.014...
14 9646481f66ce4fd2b08c2eddda42fc82 ... [-0.06481394, -0.003877871,
-0.00751088, 0.048...
15 d91a266f766b4737a06b0fda588ba40b ... [-0.06696816, 0.008392776,
-0.0054419786, 0.03...
16 bc0e3f075a4c4ebbb7c7b152b65a5625 ... [-0.011605713, -0.030790666,
-0.0008902109, 0....
17 254770028d7a4fa9877da4ba0ad5ad21 ... [-0.045195457, -0.02490488,
-0.024781283, -0.0...
18 4a67211867e5464ba45126315a122a8a ... [-0.05511852, 0.0013663026,
0.043397218, 0.002...
19 04dbbb2283b845baaeac0eaf0c34c9da ... [-0.058006745, 0.009991474,
0.04448104, 0.0205...
20 1943f245ee4243bdbfbd2fd619ae824a ... [-0.03958215, -7.2115225e-05,
0.08936498, 0.02...
21 273daeec8cad41e6b3e450447db58ee7 ... [-0.032186434, 0.015104188,
0.03962565, 0.0865...
22 e69dc259edb944ea9ea41264b9fcfe59 ... [-0.04429078, 0.018155806,
0.03003922, 0.09310...
23 e2f5735c7d714423a2c4f61ca2644626 ... [-0.055190273, 0.017616868,
0.016507143, 0.086...
24 deece7e64b2a4628850d4bb6e394a9c3 ... [-0.031237053, 0.037238844,
0.02804798, 0.0785...
25 e657b5121ff8456b9a610cfaead8e0cb ... [0.027616562, 0.035711832,
0.03694708, 0.02657...
26 bf4e255cdac94ccc83a56435a5e4b075 ... [0.044378877, 0.03286549,
0.053403515, 0.07247...
27 3b040bcc19f14e04880ae52881a89c1c ... [-0.03962782, 0.010889069,
0.045615856, -0.003...
28 3d6b216c14354332b1bf1927ba168986 ... [-0.018539483, 0.015377053,
0.006324861, 0.004...
29 1c109cfdc370463eb6d537e5b7b382fb ... [-0.065681376, -0.004882022,
0.03561782, 0.017...
30 3d0dcbc8971b415ea18065edc4d8c8ef ... [0.029657133, -0.023170393,
-0.0046286224, 0.0...
31 68105770b523412388424d984e711917 ... [-0.030532323, 0.037991133,
-0.0011697958, 0.0...
32 85c79fd84f5e4f918471c386852204c5 ... [-0.04545134, 0.03209481,
0.028792372, 0.08772...
33 eae4259b19a741ab9f9f6af18c4a0470 ... [-0.031185862, 0.057401102,
0.026549801, 0.042...
34 3138f39f2bcd43a69e0697cd3b05bc4d ... [0.05237491, 0.0035916413,
-0.01448442, 0.0371...
35 dde131ab575d44dbb55289a6972be18f ... [0.029156113, 0.05686912,
0.027205246, 0.05200...
36 de9e343f2e334d88a8ac7f8813a915e5 ... [-0.056767624, 0.006740077,
0.035738584, 0.068...
37 e2bf260115514fb3b252fd879fb3e7be ... [-0.024709905, -0.07275798,
-0.033372466, 0.01...
38 b462b94ce47a4b8c8fffa33f7242acec ... [-0.05806274, -0.036697976,
0.010909473, -0.01...
39 17ed1d92075643579a712cc6c29e8ddb ... [-0.010521335, -0.017530346,
-0.0148518095, 0....
40 3ce7c210a21b4deebad7cc9308148d86 ... [-0.0099690715, -0.021275608,
0.015752371, 0.0...
41 d64ed762ea924caa95c8d06f072a9a96 ... [-0.006371636, -0.031781916,
-0.043901198, 0.0...
42 adf4ee3fbe9b4d0381044838c4f889c8 ... [0.03008028, -0.054739267,
0.03621213, 0.04168...
43 32ee140946e5461f9275db664dc541a5 ... [0.0067304475, -0.036523268,
-0.04421857, -0.0...
44 c160b9cb27d6408ba6ab20214a2f3f81 ... [-0.018944053, -0.02489056,
0.0071473666, 0.02...
45 23527cd679ff4d5a988d52e7cd056078 ... [0.027823413, -0.06449872,
0.010141867, 0.0303...
46 f1c6eed066f24cbdb376b910fce29ed4 ... [0.028539598, -0.06675614,
0.042784836, 0.0615...
47 83a6cb03df6b41d8ad6ee5f6fef5f024 ... [0.0039935475, -0.035595946,
0.0060721235, 0.0...
48 147c038aef3e4422acbbc5f7938c4ab8 ... [0.0077742436, -0.036676265,
-0.0007366069, 0....
49 b7702b90c7f24190b864e8c6e64612a5 ... [-0.06719572, -0.02191515,
-0.06679287, 0.0249...
50 de6fa24480894518ab3cbcb66f739266 ... [-0.03515449, -0.020157838,
-0.03369378, 0.027...
51 6fae5ee1a831468aa585a1ea09095998 ... [-0.051609986, 0.010323131,
-0.036757376, 0.02...
52 ef32c4b208d041cc856f6837915dc1b0 ... [0.008965356, -0.037247375,
0.03903756, 0.0624...
53 07b2425216bd4f0aa4e079827cb48ef5 ... [-0.032199547, -0.027251536,
0.040896047, 0.03...
[54 rows x 8 columns]
D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\engine\verbs\convert.py:72: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing errors and catch exceptions explicitly instead
datetime_column = pd.to_datetime(column, errors="ignore")
D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\engine\verbs\convert.py:72: UserWarning: Could not infer format, so each element will be parsed individually, falling back to dateutil. To ensure parsing is consistent and as-expected, please specify a format.
datetime_column = pd.to_datetime(column, errors="ignore")
🚀 create_final_nodes
level title ... x y
0 0 "新浪财经" ... 0 0
1 0 "亚特兰大联储" ... 0 0
2 0 "美国" ... 0 0
3 0 "华尔街见闻" ... 0 0
4 0 "王毅" ... 0 0
5 0 "匈牙利" ... 0 0
6 0 "西雅尔多" ... 0 0
7 0 "乌克兰危机" ... 0 0
8 0 "中国" ... 0 0
9 0 "CHINA" ... 0 0
10 0 "HUNGARY" ... 0 0
11 0 "UKRAINE" ... 0 0
12 0 "WANG YI" ... 0 0
13 0 "ORBAN" ... 0 0
14 0 "UKRAINE CRISIS" ... 0 0
15 0 "乌克兰" ... 0 0
16 0 "新华社" ... 0 0
17 0 "凤凰" ... 0 0
18 0 "俄罗斯" ... 0 0
19 0 "欧佩克+协议" ... 0 0
20 0 "OPEC+" ... 0 0
21 0 "美国伊利诺伊州" ... 0 0
22 0 "华盛顿县应急管理局" ... 0
0
23 0 "纳什维尔大坝" ... 0 0
24 0 "溃坝事件" ... 0 0
25 0 "当地官员" ... 0 0
26 0 "当地居民" ... 0 0
27 0 "OPEC+协议" ... 0 0
28 0 "彭博" ... 0 0
29 0 "RUSSIA" ... 0 0
30 0 "SINA FINANCE" ... 0 0
31 0 "ILLINOIS" ... 0 0
32 0 "WASHINGTON COUNTY EMERGENCY MANAGEMENT BUREAU" ... 0 0
33 0 "NASHVILLE DAM" ... 0 0
34 0 "CCTV NEWS" ... 0 0
35 0 "DAM COLLAPSE" ... 0 0
36 0 "美国伊利诺伊州华盛顿县应急管理局"
... 0 0
37 0 "习近平" ... 0 0
38 0 "欧尔班" ... 0 0
39 0 "XINHUA NEWS AGENCY" ... 0 0
40 0 "IFLYTEK" ... 0 0
41 0 "WALL STREET SEEN" ... 0 0
42 0 "科大讯飞" ... 0 0
43 0 "讯飞星火APP" ... 0 0
44 0 "讯飞晓医APP" ... 0 0
45 0 "交通银行" ... 0 0
46 0 "人保集团" ... 0 0
47 0 "招商银行" ... 0 0
48 0 "国元证券" ... 0 0
49 0 "SZIJJARTO" ... 0 0
50 0 "CENTRAL POLITICAL BUREAU OF THE COMMUNIST PAR... ... 0 0
51 0 "PHONE CALL" ... 0 0
52 0 "中方" ... 0 0
53 0 "匈方" ... 0 0
[54 rows x 14 columns] D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead. return bound(*args, **kwds) D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead. return bound(*args, **kwds) 🚀 create_final_communities id ... text_unit_ids 0 1 ... [663affd4de3a71cf9b612eae20d761d3,8204398b499f... 1 2 ... [663affd4de3a71cf9b612eae20d761d3,a0f38ad84274... 2 0 ... [4d0bb660afb52d68caab05f6bf0b4350,b1fd07ccd1a5...
[3 rows x 6 columns] 🚀 join_text_units_to_entity_ids text_unit_ids ... id 0 4d0bb660afb52d68caab05f6bf0b4350 ... 4d0bb660afb52d68caab05f6bf0b4350 1 663affd4de3a71cf9b612eae20d761d3 ... 663affd4de3a71cf9b612eae20d761d3 2 8204398b499f96829990a722591a9b83 ... 8204398b499f96829990a722591a9b83 3 04eeac4ad722a860496c0937e3eb1856 ... 04eeac4ad722a860496c0937e3eb1856 4 60d03a6c901c38de26ffd7df96aa5d18 ... 60d03a6c901c38de26ffd7df96aa5d18 5 a0f38ad84274b55756cc22177409ff46 ... a0f38ad84274b55756cc22177409ff46 6 b1fd07ccd1a54a8ed9b309f2d01607a9 ... b1fd07ccd1a54a8ed9b309f2d01607a9 7 4f58b1869ff3695c8bd4e994ef8c84de ... 4f58b1869ff3695c8bd4e994ef8c84de 8 52bd1651a5d8d56d81c1801482965a6d ... 52bd1651a5d8d56d81c1801482965a6d 9 a1215748205916f7b5e0adccc9c22795 ... a1215748205916f7b5e0adccc9c22795 10 ca60e805b3f302a56091e5f3c8db2ab8 ... ca60e805b3f302a56091e5f3c8db2ab8 11 7f6fd5708e48481bb673873f9ed15c30 ... 7f6fd5708e48481bb673873f9ed15c30 12 71f0b35fc170eb480295254536496a1d ... 71f0b35fc170eb480295254536496a1d 13 f170ecad55e15cfe417d0302b691ca4b ... f170ecad55e15cfe417d0302b691ca4b
[14 rows x 3 columns]
D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
D:\workspace\datalab\jsrag\venv\lib\site-packages\numpy\core\fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\engine\verbs\convert.py:65: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing errors and catch exceptions explicitly instead
column_numeric = cast(pd.Series, pd.to_numeric(column, errors="ignore"))
🚀 create_final_relationships
source ... rank
0 "新浪财经" ... 4
1 "新浪财经" ... 7
2 "亚特兰大联储" ... 3
3 "华尔街见闻" ... 4
4 "王毅" ... 5
5 "王毅" ... 7
6 "匈牙利" ... 4
7 "匈牙利" ... 7
8 "匈牙利" ... 6
9 "西雅尔多" ... 3
10 "CHINA" ... 6
11 "CHINA" ... 6
12 "CHINA" ... 7
13 "HUNGARY" ... 6
14 "HUNGARY" ... 7
15 "UKRAINE" ... 7
16 "WANG YI" ... 6
17 "WANG YI" ... 6
18 "WANG YI" ... 8
19 "WANG YI" ... 5
20 "ORBAN" ... 4
21 "乌克兰" ... 6
22 "乌克兰" ... 7
23 "新华社" ... 5
24 "新华社" ... 5
25 "凤凰" ... 6
26 "俄罗斯" ... 6
27 "俄罗斯" ... 7
28 "俄罗斯" ... 6
29 "OPEC+" ... 3
30 "美国伊利诺伊州" ... 3
31 "华盛顿县应急管理局" ... 6
32 "纳什维尔大坝" ... 5
33 "纳什维尔大坝" ... 6
34 "纳什维尔大坝" ... 5
35 "当地官员" ... 3
36 "ILLINOIS" ... 4
37 "ILLINOIS" ... 4
38 "WASHINGTON COUNTY EMERGENCY MANAGEMENT BUREAU" ... 4
39 "NASHVILLE DAM" ... 4
40 "习近平" ... 5
41 "IFLYTEK" ... 2
42 "科大讯飞" ... 7
43 "科大讯飞" ... 7
44 "科大讯飞" ... 7
45 "科大讯飞" ... 7
46 "科大讯飞" ... 7
47 "科大讯飞" ... 7
48 "中方" ... 4
[49 rows x 10 columns]
🚀 join_text_units_to_relationship_ids
id
relationship_ids
0 663affd4de3a71cf9b612eae20d761d3 [2670deebfa3f4d69bb82c28ab250a209,
b785a902506...
1 4d0bb660afb52d68caab05f6bf0b4350 [404309e89a5241d6bff42c05a45df206,
ed6d2eee9d7...
2 04eeac4ad722a860496c0937e3eb1856 [d54956b79dd147f894b67a8b97dcbef0,
1745a2485a9...
3 60d03a6c901c38de26ffd7df96aa5d18 [d54956b79dd147f894b67a8b97dcbef0,
3c063eea52e...
4 a0f38ad84274b55756cc22177409ff46 [958beecdb5bb4060948415ffd75d2b03,
b999ed77e19...
5 b1fd07ccd1a54a8ed9b309f2d01607a9 [48c0c4d72da74ff5bb926fa0c856d1a7,
4f3c97517f7...
6 4f58b1869ff3695c8bd4e994ef8c84de [32e6ccab20d94029811127dbbe424c64,
94a964c6992...
7 52bd1651a5d8d56d81c1801482965a6d [32e6ccab20d94029811127dbbe424c64,
94a964c6992...
8 a1215748205916f7b5e0adccc9c22795 [1eb829d0ace042089f0746f78729696c,
26f88ab3e2e...
9 ca60e805b3f302a56091e5f3c8db2ab8 [56d0e5ebe79e4814bd1463cf6ca21394,
7c49f2710e8...
10 7f6fd5708e48481bb673873f9ed15c30 [6b02373137fd438ba96af28f735cdbdb,
d2b629c0396...
11 8204398b499f96829990a722591a9b83 [36a4fcd8efc144e6b8af9a1c7ab8b2ce,
e22d1d1cd8d...
12 71f0b35fc170eb480295254536496a1d [fbeef791d19b413a9c93c6608286ab63,
89c08e79329...
13 f170ecad55e15cfe417d0302b691ca4b [bb9e01bc171d4326a29afda59ece8d17,
3c063eea52e...
❌ create_final_community_reports
None
⠋ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ----- 100% 0:00:… 0:00:…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports❌ Errors occurred during the pipeline run, see logs for more details.
Process finished with exit code 1
Additional Information
- GraphRAG Version:0.1.1
- Operating System:Windows
- Python Version:3.10.9
In addition, the Chinese in the cache folder are not well displayed.
- graphrag\cache\summarize_descriptions\summarize-chat-v2-0a51e37418831e8ba9bc4fc845b00f56
{"result": "\"\\u8baf\\u98de\\u6653\\u533bAPP\" is a medical application developed by \\u79d1\\u5927\\u8baf\\u98de. This application is capable of diagnosing 1600 common diseases and symptoms, recognizing over 2800 common medications, and understanding 260,000 drug interactions. Additionally, it has the ability to comprehend a vast number of medical terms, making it a comprehensive tool in the medical field.", "input": "\nYou are a helpful assistant responsible for generating a comprehensive summary of the data provided below.\nGiven one or two entities, and a list of descriptions, all related to the same entity or group of entities.\nPlease concatenate all of these into a single, comprehensive description. Make sure to include information collected from all the descriptions.\nIf the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.\nMake sure it is written in third person, and include the entity names so we the have full context.\n\n#######\n-Data-\nEntities: \"\\\"\\u8baf\\u98de\\u6653\\u533bAPP\\\"\"\nDescription List: [\"\\\"\\u8baf\\u98de\\u6653\\u533bAPP is a medical application by \\u79d1\\u5927\\u8baf\\u98de that can diagnose numerous diseases and symptoms, recognize a wide range of medications, and understand a vast number of medical terms.\\\"\", \"\\\"\\u8baf\\u98de\\u6653\\u533bAPP is an application in the medical field capable of diagnosing 1600 common diseases, recognizing over 2800 common drugs, and understanding 260,000 drug interactions.\\\"\"]\n#######\nOutput:\n", "parameters": {"model": "gpt-4o", "temperature": 0.0, "frequency_penalty": 0.0, "presence_penalty": 0.0, "top_p": 1.0, "max_tokens": 500, "n": null}}
I am getting the same error. look into my post I drilled down to some extent. and came to know there is jsondecoder error because of {{ in the system message of community reports (prompts/community_reports.txt). I changed it to single { and then I found out in report that community_reports.txt is null in reports but its present in system.yaml.
more info https://github.com/microsoft/graphrag/discussions/573
can you share your indexing-engine.log in output/reports
I am getting the same error. look into my post I drilled down to some extent. and came to know there is jsondecoder error because of {{ in the system message of community reports (prompts/community_reports.txt). I changed it to single { and then I found out in report that community_reports.txt is null in reports but its present in system.yaml.
more info #573
can you share your indexing-engine.log in output/reports
I think it is because the use of .format() to fill the information in the prompt. When there is already brace character({}) in the text, you need to use double brace characters ({{ }}) instead.
https://docs.python.org/3/library/string.html#formatstrings
But I also encountered the same error that the output of model has double brace characters, which should never appear in the filled prompt given to the model...
@Amitabh-Priyadarshi-Bayer I added extract_json_dict function to the 'graphrag/llm/openai/utils.py' file to solve the DICT error.
def try_parse_json_object(input: str) -> dict:
"""Generate JSON-string output using best-attempt prompting & parsing techniques."""
try:
# result = json.loads(input)
result = extract_json_dict(input)
except json.JSONDecodeError:
log.exception("error loading json, json=%s", input)
raise
else:
if not isinstance(result, dict):
raise TypeError
return result
def extract_json_dict(text: str):
"""Parse dict from text."""
pattern = r'\{[^{}]*\}'
match = re.search(pattern, text)
if match:
json_str = match.group()
try:
json_dict = json.loads(json_str)
return json_dict
except json.JSONDecodeError:
return None
else:
return None
And then I got the error of graphrag.index.graph.extractors.community_reports.community_reports_extractor.
- indexing-engine.log
19:06:48,21 graphrag.config.read_dotenv INFO Loading pipeline .env file
19:06:48,27 graphrag.index.cli INFO using default configuration: {
"llm": {
"api_key": "REDACTED, length 67",
"type": "openai_chat",
"model": "gpt-4",
"max_tokens": 4000,
"request_timeout": 180.0,
"api_base": "http://api.test.cn/dataapi/harvest/v1",
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"root_dir": "D:\\workspace\\datalab\\jsrag\\tests\\microsoft_graphrag",
"reporting": {
"type": "file",
"base_dir": "output/${timestamp}/reports",
"storage_account_blob_url": null
},
"storage": {
"type": "file",
"base_dir": "output/${timestamp}/artifacts",
"storage_account_blob_url": null
},
"cache": {
"type": "file",
"base_dir": "cache",
"storage_account_blob_url": null
},
"input": {
"type": "file",
"file_type": "text",
"base_dir": "input",
"storage_account_blob_url": null,
"encoding": "utf-8",
"file_pattern": ".*\\.txt$",
"file_filter": null,
"source_column": null,
"timestamp_column": null,
"timestamp_format": null,
"text_column": "text",
"title_column": null,
"document_attribute_columns": []
},
"embed_graph": {
"enabled": false,
"num_walks": 10,
"walk_length": 40,
"window_size": 2,
"iterations": 3,
"random_seed": 597832,
"strategy": null
},
"embeddings": {
"llm": {
"api_key": "REDACTED, length 67",
"type": "openai_embedding",
"model": "text-embedding-3-small",
"max_tokens": 4000,
"request_timeout": 180.0,
"api_base": "http://api.test.cn/dataapi/harvest/v1",
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": null,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"batch_size": 16,
"batch_max_tokens": 8191,
"target": "required",
"skip": [],
"vector_store": null,
"strategy": null
},
"chunks": {
"size": 300,
"overlap": 100,
"group_by_columns": [
"id"
],
"strategy": null
},
"snapshots": {
"graphml": false,
"raw_entities": false,
"top_level_nodes": false
},
"entity_extraction": {
"llm": {
"api_key": "REDACTED, length 67",
"type": "openai_chat",
"model": "gpt-4",
"max_tokens": 4000,
"request_timeout": 180.0,
"api_base": "http://api.test.cn/dataapi/harvest/v1",
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/entity_extraction.txt",
"entity_types": [
"organization",
"person",
"geo",
"event"
],
"max_gleanings": 0,
"strategy": null
},
"summarize_descriptions": {
"llm": {
"api_key": "REDACTED, length 67",
"type": "openai_chat",
"model": "gpt-4",
"max_tokens": 4000,
"request_timeout": 180.0,
"api_base": "http://api.test.cn/dataapi/harvest/v1",
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": "prompts/summarize_descriptions.txt",
"max_length": 500,
"strategy": null
},
"community_reports": {
"llm": {
"api_key": "REDACTED, length 67",
"type": "openai_chat",
"model": "gpt-4",
"max_tokens": 4000,
"request_timeout": 180.0,
"api_base": "http://api.test.cn/dataapi/harvest/v1",
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"prompt": null,
"max_length": 2000,
"max_input_length": 8000,
"strategy": null
},
"claim_extraction": {
"llm": {
"api_key": "REDACTED, length 67",
"type": "openai_chat",
"model": "gpt-4",
"max_tokens": 4000,
"request_timeout": 180.0,
"api_base": "http://api.test.cn/dataapi/harvest/v1",
"api_version": null,
"proxy": null,
"cognitive_services_endpoint": null,
"deployment_name": null,
"model_supports_json": true,
"tokens_per_minute": 0,
"requests_per_minute": 0,
"max_retries": 10,
"max_retry_wait": 10.0,
"sleep_on_rate_limit_recommendation": true,
"concurrent_requests": 25
},
"parallelization": {
"stagger": 0.3,
"num_threads": 50
},
"async_mode": "threaded",
"enabled": false,
"prompt": "prompts/claim_extraction.txt",
"description": "Any claims or facts that could be relevant to information discovery.",
"max_gleanings": 0,
"strategy": null
},
"cluster_graph": {
"max_cluster_size": 10,
"strategy": null
},
"umap": {
"enabled": false
},
"local_search": {
"text_unit_prop": 0.5,
"community_prop": 0.1,
"conversation_history_max_turns": 5,
"top_k_entities": 10,
"top_k_relationships": 10,
"max_tokens": 12000,
"llm_max_tokens": 2000
},
"global_search": {
"max_tokens": 12000,
"data_max_tokens": 12000,
"map_max_tokens": 1000,
"reduce_max_tokens": 2000,
"concurrency": 32
},
"encoding_model": "cl100k_base",
"skip_workflows": []
}
19:06:48,29 graphrag.index.create_pipeline_config INFO skipping workflows
19:06:48,31 graphrag.index.run INFO Running pipeline
19:06:48,31 graphrag.index.storage.file_pipeline_storage INFO Creating file storage at D:\workspace\datalab\jsrag\tests\microsoft_graphrag\output\20240717-190647\artifacts
19:06:48,33 graphrag.index.input.load_input INFO loading input from root_dir=input
19:06:48,33 graphrag.index.input.load_input INFO using file storage for input
19:06:48,34 graphrag.index.storage.file_pipeline_storage INFO search D:\workspace\datalab\jsrag\tests\microsoft_graphrag\input for files matching .*\.txt$
19:06:48,34 graphrag.index.input.text INFO found text files from input, found [('news.txt', {})]
19:06:48,39 graphrag.index.workflows.load INFO Workflow Run Order: ['create_base_text_units', 'create_base_extracted_entities', 'create_summarized_entities', 'create_base_entity_graph', 'create_final_entities', 'create_final_nodes', 'create_final_communities', 'join_text_units_to_entity_ids', 'create_final_relationships', 'join_text_units_to_relationship_ids', 'create_final_community_reports', 'create_final_text_units', 'create_base_documents', 'create_final_documents']
19:06:48,39 graphrag.index.run INFO Final # of rows loaded: 1
19:06:48,137 graphrag.index.run INFO Running workflow: create_base_text_units...
19:06:48,137 graphrag.index.run INFO dependencies for create_base_text_units: []
19:06:48,138 datashaper.workflow.workflow INFO executing verb orderby
19:06:48,138 datashaper.workflow.workflow INFO executing verb zip
19:06:48,139 datashaper.workflow.workflow INFO executing verb aggregate_override
19:06:48,142 datashaper.workflow.workflow INFO executing verb chunk
19:06:48,321 datashaper.workflow.workflow INFO executing verb select
19:06:48,321 datashaper.workflow.workflow INFO executing verb unroll
19:06:48,323 datashaper.workflow.workflow INFO executing verb rename
19:06:48,324 datashaper.workflow.workflow INFO executing verb genid
19:06:48,325 datashaper.workflow.workflow INFO executing verb unzip
19:06:48,326 datashaper.workflow.workflow INFO executing verb copy
19:06:48,326 datashaper.workflow.workflow INFO executing verb filter
19:06:48,333 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_text_units.parquet
19:06:48,446 graphrag.index.run INFO Running workflow: create_base_extracted_entities...
19:06:48,446 graphrag.index.run INFO dependencies for create_base_extracted_entities: ['create_base_text_units']
19:06:48,447 graphrag.index.run INFO read table from storage: create_base_text_units.parquet
19:06:48,455 datashaper.workflow.workflow INFO executing verb entity_extract
19:06:48,462 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=http://api.test.cn/dataapi/harvest/v1
19:06:48,493 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for gpt-4: TPM=0, RPM=0
19:06:48,493 graphrag.index.llm.load_llm INFO create concurrency limiter for gpt-4: 25
19:06:57,423 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:06:57,426 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 8.829000000001543. input_tokens=2055, output_tokens=276
19:07:03,221 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:03,222 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 14.625. input_tokens=2254, output_tokens=294
19:07:06,175 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:06,176 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 17.610000000000582. input_tokens=2268, output_tokens=282
19:07:06,438 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:06,440 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 17.92199999999866. input_tokens=2270, output_tokens=613
19:07:06,702 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:06,704 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 18.17199999999866. input_tokens=2267, output_tokens=624
19:07:06,734 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:06,735 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 18.187000000001717. input_tokens=2266, output_tokens=627
19:07:08,339 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:08,341 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 19.78099999999904. input_tokens=2267, output_tokens=324
19:07:11,781 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:11,782 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 23.20300000000134. input_tokens=2267, output_tokens=528
19:07:14,165 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:14,166 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 25.625. input_tokens=2268, output_tokens=418
19:07:15,95 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:15,97 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 26.5779999999977. input_tokens=2268, output_tokens=431
19:07:15,251 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:15,253 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 26.75. input_tokens=2270, output_tokens=628
19:07:16,474 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:16,476 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 27.906000000002678. input_tokens=2269, output_tokens=666
19:07:22,982 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:22,983 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 34.4220000000023. input_tokens=2269, output_tokens=816
19:07:30,237 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:30,238 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "Process" with 0 retries took 41.70300000000134. input_tokens=2268, output_tokens=735
19:07:30,242 datashaper.workflow.workflow INFO executing verb merge_graphs
19:07:30,254 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_extracted_entities.parquet
19:07:30,378 graphrag.index.run INFO Running workflow: create_summarized_entities...
19:07:30,378 graphrag.index.run INFO dependencies for create_summarized_entities: ['create_base_extracted_entities']
19:07:30,379 graphrag.index.run INFO read table from storage: create_base_extracted_entities.parquet
19:07:30,383 datashaper.workflow.workflow INFO executing verb summarize_descriptions
19:07:32,345 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:32,346 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 1.9059999999990396. input_tokens=259, output_tokens=40
19:07:32,378 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:32,379 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 1.9380000000019209. input_tokens=233, output_tokens=31
19:07:32,599 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:32,600 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.0780000000013388. input_tokens=279, output_tokens=37
19:07:32,743 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:32,744 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.2030000000013388. input_tokens=162, output_tokens=44
19:07:32,806 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:32,807 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.3440000000009604. input_tokens=269, output_tokens=44
19:07:32,838 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:32,839 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.3429999999971187. input_tokens=296, output_tokens=48
19:07:32,997 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:32,998 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.5320000000028813. input_tokens=189, output_tokens=67
19:07:33,37 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,38 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.5. input_tokens=164, output_tokens=32
19:07:33,297 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,299 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.8440000000009604. input_tokens=317, output_tokens=66
19:07:33,349 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,350 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.875. input_tokens=254, output_tokens=64
19:07:33,490 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,492 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.0159999999996217. input_tokens=187, output_tokens=72
19:07:33,569 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,570 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.062999999998283. input_tokens=222, output_tokens=84
19:07:33,624 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,626 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.0940000000009604. input_tokens=329, output_tokens=77
19:07:33,661 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,662 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.1569999999992433. input_tokens=246, output_tokens=75
19:07:33,749 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:33,750 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.2030000000013388. input_tokens=159, output_tokens=34
19:07:34,59 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,60 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.577999999997701. input_tokens=176, output_tokens=50
19:07:34,150 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,151 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.6409999999996217. input_tokens=320, output_tokens=42
19:07:34,306 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,307 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.8590000000003783. input_tokens=454, output_tokens=100
19:07:34,393 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,394 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 3.9530000000013388. input_tokens=600, output_tokens=104
19:07:34,673 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,674 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 4.2040000000015425. input_tokens=189, output_tokens=62
19:07:34,688 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,689 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 2.3440000000009604. input_tokens=225, output_tokens=22
19:07:34,691 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,692 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 4.186999999998079. input_tokens=317, output_tokens=52
19:07:34,807 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:34,808 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 4.311999999998079. input_tokens=474, output_tokens=101
19:07:35,255 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:35,256 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 4.764999999999418. input_tokens=504, output_tokens=74
19:07:36,281 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:36,282 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 5.75. input_tokens=476, output_tokens=102
19:07:36,571 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:36,572 graphrag.llm.base.rate_limiting_llm INFO perf - llm.chat "summarize" with 0 retries took 6.125. input_tokens=572, output_tokens=93
19:07:36,580 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_summarized_entities.parquet
19:07:36,683 graphrag.index.run INFO Running workflow: create_base_entity_graph...
19:07:36,683 graphrag.index.run INFO dependencies for create_base_entity_graph: ['create_summarized_entities']
19:07:36,684 graphrag.index.run INFO read table from storage: create_summarized_entities.parquet
19:07:36,688 datashaper.workflow.workflow INFO executing verb cluster_graph
19:07:36,706 datashaper.workflow.workflow INFO executing verb select
19:07:36,709 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_base_entity_graph.parquet
19:07:36,817 graphrag.index.run INFO Running workflow: create_final_entities...
19:07:36,817 graphrag.index.run INFO dependencies for create_final_entities: ['create_base_entity_graph']
19:07:36,818 graphrag.index.run INFO read table from storage: create_base_entity_graph.parquet
19:07:36,822 datashaper.workflow.workflow INFO executing verb unpack_graph
19:07:36,826 datashaper.workflow.workflow INFO executing verb rename
19:07:36,826 datashaper.workflow.workflow INFO executing verb select
19:07:36,827 datashaper.workflow.workflow INFO executing verb dedupe
19:07:36,828 datashaper.workflow.workflow INFO executing verb rename
19:07:36,828 datashaper.workflow.workflow INFO executing verb filter
19:07:36,832 datashaper.workflow.workflow INFO executing verb text_split
19:07:36,833 datashaper.workflow.workflow INFO executing verb drop
19:07:36,834 datashaper.workflow.workflow INFO executing verb merge
19:07:36,847 datashaper.workflow.workflow INFO executing verb text_embed
19:07:36,849 graphrag.llm.openai.create_openai_client INFO Creating OpenAI client base_url=http://api.test.cn/dataapi/harvest/v1
19:07:36,880 graphrag.index.llm.load_llm INFO create TPM/RPM limiter for text-embedding-3-small: TPM=0, RPM=0
19:07:36,880 graphrag.index.llm.load_llm INFO create concurrency limiter for text-embedding-3-small: 25
19:07:36,885 graphrag.index.verbs.text.embed.strategies.openai INFO embedding 54 inputs via 54 snippets using 4 batches. max_batch_size=16, max_tokens=8191
19:07:38,91 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/embeddings "HTTP/1.1 200 OK"
19:07:38,143 graphrag.llm.base.rate_limiting_llm INFO perf - llm.embedding "Process" with 0 retries took 1.25. input_tokens=221, output_tokens=0
19:07:39,971 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/embeddings "HTTP/1.1 200 OK"
19:07:40,125 graphrag.llm.base.rate_limiting_llm INFO perf - llm.embedding "Process" with 0 retries took 3.235000000000582. input_tokens=833, output_tokens=0
19:07:40,420 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/embeddings "HTTP/1.1 200 OK"
19:07:40,567 graphrag.llm.base.rate_limiting_llm INFO perf - llm.embedding "Process" with 0 retries took 3.6719999999986612. input_tokens=669, output_tokens=0
19:07:40,593 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/embeddings "HTTP/1.1 200 OK"
19:07:40,746 graphrag.llm.base.rate_limiting_llm INFO perf - llm.embedding "Process" with 0 retries took 3.860000000000582. input_tokens=986, output_tokens=0
19:07:40,759 datashaper.workflow.workflow INFO executing verb drop
19:07:40,760 datashaper.workflow.workflow INFO executing verb filter
19:07:40,765 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_final_entities.parquet
19:07:40,947 graphrag.index.run INFO Running workflow: create_final_nodes...
19:07:40,947 graphrag.index.run INFO dependencies for create_final_nodes: ['create_base_entity_graph']
19:07:40,947 graphrag.index.run INFO read table from storage: create_base_entity_graph.parquet
19:07:40,951 datashaper.workflow.workflow INFO executing verb layout_graph
19:07:40,966 datashaper.workflow.workflow INFO executing verb unpack_graph
19:07:40,971 datashaper.workflow.workflow INFO executing verb unpack_graph
19:07:40,975 datashaper.workflow.workflow INFO executing verb drop
19:07:40,976 datashaper.workflow.workflow INFO executing verb filter
19:07:40,979 datashaper.workflow.workflow INFO executing verb select
19:07:40,980 datashaper.workflow.workflow INFO executing verb rename
19:07:40,981 datashaper.workflow.workflow INFO executing verb join
19:07:40,987 datashaper.workflow.workflow INFO executing verb convert
19:07:40,989 datashaper.workflow.workflow INFO executing verb rename
19:07:40,992 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_final_nodes.parquet
19:07:41,124 graphrag.index.run INFO Running workflow: create_final_communities...
19:07:41,124 graphrag.index.run INFO dependencies for create_final_communities: ['create_base_entity_graph']
19:07:41,124 graphrag.index.run INFO read table from storage: create_base_entity_graph.parquet
19:07:41,128 datashaper.workflow.workflow INFO executing verb unpack_graph
19:07:41,132 datashaper.workflow.workflow INFO executing verb unpack_graph
19:07:41,136 datashaper.workflow.workflow INFO executing verb aggregate_override
19:07:41,139 datashaper.workflow.workflow INFO executing verb join
19:07:41,145 datashaper.workflow.workflow INFO executing verb join
19:07:41,152 datashaper.workflow.workflow INFO executing verb concat
19:07:41,152 datashaper.workflow.workflow INFO executing verb filter
19:07:41,157 datashaper.workflow.workflow INFO executing verb aggregate_override
19:07:41,161 datashaper.workflow.workflow INFO executing verb join
19:07:41,168 datashaper.workflow.workflow INFO executing verb filter
19:07:41,173 datashaper.workflow.workflow INFO executing verb fill
19:07:41,174 datashaper.workflow.workflow INFO executing verb merge
19:07:41,176 datashaper.workflow.workflow INFO executing verb copy
19:07:41,177 datashaper.workflow.workflow INFO executing verb select
19:07:41,180 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_final_communities.parquet
19:07:41,290 graphrag.index.run INFO Running workflow: join_text_units_to_entity_ids...
19:07:41,290 graphrag.index.run INFO dependencies for join_text_units_to_entity_ids: ['create_final_entities']
19:07:41,290 graphrag.index.run INFO read table from storage: create_final_entities.parquet
19:07:41,297 datashaper.workflow.workflow INFO executing verb select
19:07:41,298 datashaper.workflow.workflow INFO executing verb unroll
19:07:41,300 datashaper.workflow.workflow INFO executing verb aggregate_override
19:07:41,304 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table join_text_units_to_entity_ids.parquet
19:07:41,413 graphrag.index.run INFO Running workflow: create_final_relationships...
19:07:41,413 graphrag.index.run INFO dependencies for create_final_relationships: ['create_final_nodes', 'create_base_entity_graph']
19:07:41,414 graphrag.index.run INFO read table from storage: create_final_nodes.parquet
19:07:41,419 graphrag.index.run INFO read table from storage: create_base_entity_graph.parquet
19:07:41,423 datashaper.workflow.workflow INFO executing verb unpack_graph
19:07:41,427 datashaper.workflow.workflow INFO executing verb filter
19:07:41,431 datashaper.workflow.workflow INFO executing verb rename
19:07:41,431 datashaper.workflow.workflow INFO executing verb filter
19:07:41,435 datashaper.workflow.workflow INFO executing verb drop
19:07:41,435 datashaper.workflow.workflow INFO executing verb compute_edge_combined_degree
19:07:41,439 datashaper.workflow.workflow INFO executing verb convert
19:07:41,440 datashaper.workflow.workflow INFO executing verb convert
19:07:41,442 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table create_final_relationships.parquet
19:07:41,566 graphrag.index.run INFO Running workflow: join_text_units_to_relationship_ids...
19:07:41,567 graphrag.index.run INFO dependencies for join_text_units_to_relationship_ids: ['create_final_relationships']
19:07:41,567 graphrag.index.run INFO read table from storage: create_final_relationships.parquet
19:07:41,572 datashaper.workflow.workflow INFO executing verb select
19:07:41,573 datashaper.workflow.workflow INFO executing verb unroll
19:07:41,574 datashaper.workflow.workflow INFO executing verb aggregate_override
19:07:41,577 datashaper.workflow.workflow INFO executing verb select
19:07:41,579 graphrag.index.emit.parquet_table_emitter INFO emitting parquet table join_text_units_to_relationship_ids.parquet
19:07:41,690 graphrag.index.run INFO Running workflow: create_final_community_reports...
19:07:41,690 graphrag.index.run INFO dependencies for create_final_community_reports: ['create_final_relationships', 'create_final_nodes']
19:07:41,690 graphrag.index.run INFO read table from storage: create_final_relationships.parquet
19:07:41,696 graphrag.index.run INFO read table from storage: create_final_nodes.parquet
19:07:41,701 datashaper.workflow.workflow INFO executing verb prepare_community_reports_nodes
19:07:41,703 datashaper.workflow.workflow INFO executing verb prepare_community_reports_edges
19:07:41,705 datashaper.workflow.workflow INFO executing verb restore_community_hierarchy
19:07:41,709 datashaper.workflow.workflow INFO executing verb prepare_community_reports
19:07:41,709 graphrag.index.verbs.graph.report.prepare_community_reports INFO Number of nodes at level=0 => 54
19:07:41,735 datashaper.workflow.workflow INFO executing verb create_community_reports
19:07:56,915 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:57,92 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:07:59,619 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:10,645 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:13,598 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:14,321 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:24,845 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:30,789 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:37,523 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:37,524 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
File "D:\workspace\datalab\jsrag\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py", line 58, in __call__
await self._llm(
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\json_parsing_llm.py", line 34, in __call__
result = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_token_replacing_llm.py", line 37, in __call__
return await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_history_tracking_llm.py", line 33, in __call__
output = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\caching_llm.py", line 104, in __call__
result = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
result = await action(retry_state)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\_utils.py", line 99, in inner
return call(*args, **kwargs)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "D:\software\python310\lib\concurrent\futures\_base.py", line 451, in result
return self.__get_result()
File "D:\software\python310\lib\concurrent\futures\_base.py", line 403, in __get_result
raise self._exception
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\base_llm.py", line 48, in __call__
return await self._invoke_json(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_chat_llm.py", line 90, in _invoke_json
raise RuntimeError(FAILED_TO_CREATE_JSON_ERROR)
RuntimeError: Failed to generate valid JSON output
19:08:37,529 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
19:08:37,530 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 0
19:08:45,375 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:08:45,376 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
File "D:\workspace\datalab\jsrag\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py", line 58, in __call__
await self._llm(
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\json_parsing_llm.py", line 34, in __call__
result = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_token_replacing_llm.py", line 37, in __call__
return await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_history_tracking_llm.py", line 33, in __call__
output = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\caching_llm.py", line 104, in __call__
result = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
result = await action(retry_state)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\_utils.py", line 99, in inner
return call(*args, **kwargs)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "D:\software\python310\lib\concurrent\futures\_base.py", line 451, in result
return self.__get_result()
File "D:\software\python310\lib\concurrent\futures\_base.py", line 403, in __get_result
raise self._exception
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\base_llm.py", line 48, in __call__
return await self._invoke_json(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_chat_llm.py", line 90, in _invoke_json
raise RuntimeError(FAILED_TO_CREATE_JSON_ERROR)
RuntimeError: Failed to generate valid JSON output
19:08:45,380 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
19:08:45,380 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 2
19:08:50,566 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:09:38,746 httpx INFO HTTP Request: POST http://api.test.cn/dataapi/harvest/v1/chat/completions "HTTP/1.1 200 OK"
19:09:38,747 graphrag.index.graph.extractors.community_reports.community_reports_extractor ERROR error generating community report
Traceback (most recent call last):
File "D:\workspace\datalab\jsrag\graphrag\index\graph\extractors\community_reports\community_reports_extractor.py", line 58, in __call__
await self._llm(
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\json_parsing_llm.py", line 34, in __call__
result = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_token_replacing_llm.py", line 37, in __call__
return await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_history_tracking_llm.py", line 33, in __call__
output = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\caching_llm.py", line 104, in __call__
result = await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 177, in __call__
result, start = await execute_with_retry()
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 159, in execute_with_retry
async for attempt in retryer:
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 166, in __anext__
do = await self.iter(retry_state=self._retry_state)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
result = await action(retry_state)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\_utils.py", line 99, in inner
return call(*args, **kwargs)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "D:\software\python310\lib\concurrent\futures\_base.py", line 451, in result
return self.__get_result()
File "D:\software\python310\lib\concurrent\futures\_base.py", line 403, in __get_result
raise self._exception
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 165, in execute_with_retry
return await do_attempt(), start
File "D:\workspace\datalab\jsrag\graphrag\llm\base\rate_limiting_llm.py", line 147, in do_attempt
return await self._delegate(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\base\base_llm.py", line 48, in __call__
return await self._invoke_json(input, **kwargs)
File "D:\workspace\datalab\jsrag\graphrag\llm\openai\openai_chat_llm.py", line 90, in _invoke_json
raise RuntimeError(FAILED_TO_CREATE_JSON_ERROR)
RuntimeError: Failed to generate valid JSON output
19:09:38,750 graphrag.index.reporting.file_workflow_callbacks INFO Community Report Extraction Error details=None
19:09:38,750 graphrag.index.verbs.graph.report.strategies.graph_intelligence.run_graph_intelligence WARNING No report found for community: 1
19:09:38,751 datashaper.workflow.workflow INFO executing verb window
19:09:38,751 datashaper.workflow.workflow ERROR Error executing verb "window" in create_final_community_reports: 'community'
Traceback (most recent call last):
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\engine\verbs\window.py", line 73, in window
window = __window_function_map[window_operation](input_table[column])
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__
indexer = self.columns.get_loc(key)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc
raise KeyError(key)
KeyError: 'community'
19:09:38,755 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "window" in create_final_community_reports: 'community' details=None
19:09:38,755 graphrag.index.run ERROR error running workflow create_final_community_reports
Traceback (most recent call last):
File "D:\workspace\datalab\jsrag\graphrag\index\run.py", line 323, in run_pipeline
result = await workflow.run(context, callbacks)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\workflow\workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb
result = node.verb.func(**verb_args)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\datashaper\engine\verbs\window.py", line 73, in window
window = __window_function_map[window_operation](input_table[column])
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__
indexer = self.columns.get_loc(key)
File "D:\workspace\datalab\jsrag\venv\lib\site-packages\pandas\core\indexes\range.py", line 417, in get_loc
raise KeyError(key)
KeyError: 'community'
19:09:38,757 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None
I also encountered the same problem yesterday, but today I didn't make any changes and miraculously succeeded when I ran it again
I've fixed the issue, mostly due to unstable parse functions. The 'try_parse_json_object' function in the graphrag/llm/openai/utils.py code has been modified as follows:
def try_parse_json_object(input: str) -> dict:
"""Generate JSON-string output using best-attempt prompting & parsing techniques."""
try:
clean_json = clean_up_json(input)
result = json.loads(clean_json)
except json.JSONDecodeError:
log.exception("error loading json, json=%s", input)
raise
else:
if not isinstance(result, dict):
raise TypeError
return result
def clean_up_json(json_str: str) -> str:
"""Clean up json string."""
json_str = (
json_str.replace("\\n", "")
.replace("\n", "")
.replace("\r", "")
.replace('"[{', "[{")
.replace('}]"', "}]")
.replace("\\", "")
# Refer: graphrag\llm\openai\_json.py,graphrag\index\utils\json.py
.replace("{{", "{")
.replace("}}", "}")
.strip()
)
# Remove JSON Markdown Frame
if json_str.startswith("```json"):
json_str = json_str[len("```json"):]
if json_str.endswith("```"):
json_str = json_str[: len(json_str) - len("```")]
return json_str
@crazyyanchao
@AlonsoGuevara
There is an easy way to do it by changing the System message for community_report. change all {{ in community_report.txt to {, so that gpt will generate the json in correct format. rather than changing the codebase.
but the problem is GraphRAG is not reading from system message file defined for community reports in settings.yaml. in your log file also in community report section prompt value is null, it should be the system message filename that is mentioned in settings.yaml.
@crazyyanchao @AlonsoGuevara
There is an easy way to do it by changing the System message for community_report. change all {{ in community_report.txt to {, so that gpt will generate the json in correct format. rather than changing the codebase.
but the problem is GraphRAG is not reading from system message file defined for community reports in settings.yaml. in your log file also in community report section prompt value is null, it should be the system message filename that is mentioned in settings.yaml.
Thank you for your reply, I have understood the issue in depth. In addition, I would like to add that the current parsing function is indeed unstable, and I suggest following the practice of langchain so that users can customize the parser.
I found that the same article, if you have too many words, you will report an error in create final entities. If you delete some words, you will succeed. Which parameter does this word count relate to, the embedding model or the settting.yaml?
哇塞 真的哭死 解决了好久 没搞定 ~谢谢了 好人一生平安
In addition, the Chinese in the cache folder are not well displayed.
* graphrag\cache\summarize_descriptions\summarize-chat-v2-0a51e37418831e8ba9bc4fc845b00f56{"result": "\"\\u8baf\\u98de\\u6653\\u533bAPP\" is a medical application developed by \\u79d1\\u5927\\u8baf\\u98de. This application is capable of diagnosing 1600 common diseases and symptoms, recognizing over 2800 common medications, and understanding 260,000 drug interactions. Additionally, it has the ability to comprehend a vast number of medical terms, making it a comprehensive tool in the medical field.", "input": "\nYou are a helpful assistant responsible for generating a comprehensive summary of the data provided below.\nGiven one or two entities, and a list of descriptions, all related to the same entity or group of entities.\nPlease concatenate all of these into a single, comprehensive description. Make sure to include information collected from all the descriptions.\nIf the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.\nMake sure it is written in third person, and include the entity names so we the have full context.\n\n#######\n-Data-\nEntities: \"\\\"\\u8baf\\u98de\\u6653\\u533bAPP\\\"\"\nDescription List: [\"\\\"\\u8baf\\u98de\\u6653\\u533bAPP is a medical application by \\u79d1\\u5927\\u8baf\\u98de that can diagnose numerous diseases and symptoms, recognize a wide range of medications, and understand a vast number of medical terms.\\\"\", \"\\\"\\u8baf\\u98de\\u6653\\u533bAPP is an application in the medical field capable of diagnosing 1600 common diseases, recognizing over 2800 common drugs, and understanding 260,000 drug interactions.\\\"\"]\n#######\nOutput:\n", "parameters": {"model": "gpt-4o", "temperature": 0.0, "frequency_penalty": 0.0, "presence_penalty": 0.0, "top_p": 1.0, "max_tokens": 500, "n": null}}
我通过修改json库的dump和dumps方法的ensure_ascm参数默认值为False似乎能暂时解决这个问题
We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.