NBSPRC-spider
NBSPRC-spider copied to clipboard
中文乱码
抓取2017版最新数据,发现部分区域名称存在乱码情况, 国家统计局页面源码的编码定义为gb2312,实际为gbk 因此 需要手工指定编码 def getUrl(url,num_retries = 5): ua = UserAgent() headers = {'User-Agent':ua.random} try: response = requests.get(url,headers = headers) response.encoding = "GBK" data = response.text print(url) return data except Exception as e: if num_retries > 0: time.sleep(10) print(url) print("requests fail, retry!") return getUrl(url,num_retries-1) #递归调用 else: print("retry fail!") print("error: %s" % e + " " + url) return #返回空值,程序运行报错`
感谢!已修改代码!