AutomatedDataCollectionWithR icon indicating copy to clipboard operation
AutomatedDataCollectionWithR copied to clipboard

老师,365淘房网站上的数据爬取出问题,麻烦您帮我看一下

Open dayushan opened this issue 8 years ago • 1 comments

.libPaths("D:/R/library")
library(RCurl)
library(bitops)
library(XML)
library(stringr)
library(plyr)
library(rvest)
##i为2010和2011时会报错
##Error in eval(substitute(expr), envir, enclos) : 
#  input conversion failed due to input error, bytes 0xA9 0x4F 0xC6 0xF0 [6003]
for(i in 2012:2014){
	for(j in 1:12){
		mac_url<-paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",i,"-",j,"-11/",sep="")
	          #paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",paste(i,j,11,sep="-"),"/",sep="")
		url<-getHTMLLinks(mac_url)[4]
		if(url=="javascript:void(0);"){
			mac_url<-paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",i,"-",j,"-21/",sep="")
	            #paste("http://news.nj.house365.com/newslist/esfpd/esfbb/date=",paste(i,j,21,sep="-"),"/",sep="")
		      url<-getHTMLLinks(mac_url)[4]
		}
		#wp<-getURL(url,.encoding="gb2312") #用网页本身的编码
		#wp2=iconv(wp,"gb2312","UTF-8") #转码
		#Encoding(wp2) #UTF-8
		#doc <- htmlParse(wp2,asText=T,encoding="UTF-8")
		web<-read_html(url,encoding="gb2312")
               ..........此处代码省略........
			}
}

以上为我的代码,但是在采集2012年4月的行情数据时报错,报错内容如下: ##Error in eval(substitute(expr), envir, enclos) :

input conversion failed due to input error, bytes 0xA9 0x4F 0xC6 0xF0 [6003]

麻烦吴老师帮我看一下

dayushan avatar Mar 02 '17 06:03 dayushan

web<-read_html(url,encoding="gb2312")      # 璋?璇????版??杩?涓??ュ?洪??锛???璇?淇℃???? encoding 涓?瀵?# 浠?缁???浜?涓?涓?缃?椤电?? <head> ?ㄥ??锛???锛? 
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
# 缃?椤靛0???? encoding 纭?瀹??? gb2312锛?浣????跺?纰板?拌?缃?椤靛????缂?????澹版??涓?绗?# ?ㄤ?涓??¤???ュ?璇??? encoding ?逛负 gbk锛?
web<-read_html(url,encoding="gbk")
# 杩?琛?姝e父锛?娌℃???ラ?????辨?ゅ??瑙?锛?杩?涓???棰????变?缃?椤电?? encoding 涓?瑙???瀵艰?寸????

coderLMN avatar Mar 02 '17 12:03 coderLMN