WEB_KG
WEB_KG copied to clipboard
爬取百度百科中文页面,抽取三元组信息,构建中文知识图谱
您好,我想请教一下,当数据量大到一定程度后,可视化显示卡顿,要处理好久,这部分您是怎么优化的。
Traceback (most recent call last): File "html_parser.py", line 55, in new_urls, _ = parser.parse(content) File "html_parser.py", line 44, in parse is_saved = self._save_new_data( soup,html_cont) File "html_parser.py", line 34, in _save_new_data...
python extract-table.py 这一步好像就把需要的信息从html中提取出来了 谢谢
爬到的东西为空
像爬取的text为空,还有就是添加三元组的时候attrs和values也是空的,所以加不到三元组里
from neo4j.v1 import GraphDatabase ModuleNotFoundError: No module named 'neo4j.v1' why?
结果是这个:(base) D:\bandzip\WEB_KG-master\baike>d:/ProgramData/Anaconda3/python.exe d:/bandzip/WEB_KG-master/baike/spiders/baike.py 感觉好像没有爬取到内容,是我什么地方出错了吗。
D:\bandzip\WEB_KG-master\baike\spiders>python baike.py Traceback (most recent call last): File "baike.py", line 20, in class BaikeSpider(scrapy.Spider): File "baike.py", line 30, in BaikeSpider driver = GraphDatabase.driver( File "d:\ProgramData\Anaconda3\lib\site-packages\neo4j\__init__.py", line 120, in driver return...
现在还能找到成型的知识图谱吗