NBSPRC-spider icon indicating copy to clipboard operation
NBSPRC-spider copied to clipboard

中文乱码

Open lixinyiabc123 opened this issue 7 years ago • 1 comments

抓取2017版最新数据,发现部分区域名称存在乱码情况, 国家统计局页面源码的编码定义为gb2312,实际为gbk 因此 需要手工指定编码 def getUrl(url,num_retries = 5): ua = UserAgent() headers = {'User-Agent':ua.random} try: response = requests.get(url,headers = headers) response.encoding = "GBK" data = response.text print(url) return data except Exception as e: if num_retries > 0: time.sleep(10) print(url) print("requests fail, retry!") return getUrl(url,num_retries-1) #递归调用 else: print("retry fail!") print("error: %s" % e + " " + url) return #返回空值,程序运行报错`

lixinyiabc123 avatar Nov 29 '18 04:11 lixinyiabc123

感谢!已修改代码!

dta0502 avatar Dec 02 '18 02:12 dta0502