BaiduPinyinCrawler
BaiduPinyinCrawler copied to clipboard
百度汉语字典爬虫,拼音数据,35万海量百度词典数据。
ç¾åº¦æ±è¯åå ¸ç¬è«
å©ç¨ç¬è«ä»ç¾åº¦æåæææ±åçè¯ç»ï¼ç¶åæ´çææçè¯ç»å¨mysqlæ°æ®åºä¸ã
åºäº ç¾åº¦æ±è¯æ°æ®(å ±æå35Wè¯ç»æ¼é³æ°æ®) ã
使ç¨æ¹æ³
å®è£
$ git clone https://github.com/SivanLaai/BaiduPinyinCrawler.git
$ cd BaiduPinyinCrawler
$ mv setting_sample.ini setting.ini
$ pip install -r requirements.txt
å®è£ mysql
åå»ºè¡¨æ ¼
CREATE TABLE `single_character_info` (
`pinyin` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
`word` varchar(255) NOT NULL,
`plainPinyin` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
`definition` varchar(4096) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`pronunciation` varchar(255) DEFAULT NULL,
`wordID` int DEFAULT NULL,
PRIMARY KEY (`word`,`pinyin`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
é ç½®setting
[LOG]
LEVEL = INFO //æ¥å¿ç级
LOG_PATH = ./FundCrawler/logs //æ¥å¿ç®å½
[MYSQL]
host = 127.0.0.1 //MYSQLæå¡å¨ip
PORT = 20137 //MYSQLæå¡å¨ç«¯å£
USERNAME = username
PASSWORD = password
DATA_BASE_NAME = Fund
è¿è¡ç¬è«
# ä¼å¼å§æåç¾åº¦ä¸ææçè¯ç»åæ¼é³ä»¥å常è§çå«ä¹ã
$ python PinyinDataCrawler.py
注æäºé¡¹
- å 为æ°æ®éè¿å¤§ï¼ç¬è«çæåæ¶é´å¯è½éè¦1å°2天ï¼éè¦ä¿è¯ç¨åºçæ£å¸¸è¿è¡ã
- å é 置好mysqlã