node-lessons Lesson3爬虫问题

发现很多网页都是前端渲染的。怎么爬呢

Aug 30 '15 00:08 cycgit

如果你想获取到前端页面的话，我感觉这就很困难了。但是如果你仅仅为了获取数据的话，可以去看看页面的ajax请求，并从中找到获取数据的办法。

Aug 30 '15 18:08 XGHeaven

嗯嗯，获取数据的话，去看对应的 ajax 请求

在 2015年8月31日上午2:21，XGHeaven [email protected]写道：

如果你想获取到前端页面的话，我感觉这就很困难了。但是如果你仅仅为了获取数据的话，可以去看看页面的ajax请求，并从中找到获取数据的办法。

— Reply to this email directly or view it on GitHub https://github.com/alsotang/node-lessons/issues/57#issuecomment-136167513 .

Aug 31 '15 02:08 alsotang

看来前端渲染越来越流行了，这对我们数据抓取er 来说是个利好。。。。。。抓接口什么的最爽了！！

Sep 09 '15 07:09 hugojing

@alsotang 最近尝试 phantomjs 的screenCapture 功能，但抓取 tmall.com 页面时会有大面积空白。有什么好的解决方案吗？

Dec 14 '15 17:12 byr-gdp

不是很懂，乱说两句。

是不是因为 tmall 的图片是延迟载入的？这时候试试 phantomjs 能不能模拟鼠标滚轮上下滚滚。。触发那些图片的显示？？？

2015-12-15 1:39 GMT+08:00 Dapeng Gong [email protected]:

@alsotang https://github.com/alsotang 最近尝试 phantomjs 的screenCapture 功能，但抓取 tmall.com 页面时会有大面积空白。有什么好的解决方案吗？

— Reply to this email directly or view it on GitHub https://github.com/alsotang/node-lessons/issues/57#issuecomment-164505172 .

Dec 15 '15 03:12 alsotang

@alsotang 对对对，tmall 图片延迟加载。我之前试了试 setTimeout，发现没用。回头再仔细看看文档。thx~

Dec 15 '15 03:12 byr-gdp

如果加上使用koa的就好了

Apr 26 '16 06:04 zeroone001

请问一下实例中 var $ = cheerio.load(sres.text); sres.text是新建并复制cnode.org 内容？是否可以直接写成爬取的网址

Aug 15 '18 03:08 a2774206

能不能写网址的问题，去看cheerio api

a2774206 [email protected] 于2018年8月15日周三上午11:18写道：

请问一下实例中 var $ = cheerio.load(sres.text); sres.text是新建并复制cnode.org http://sres.xn--textcnode-y04o251d3jp9rbg31cowb.org 内容？是否可以直接写成爬取的网址

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alsotang/node-lessons/issues/57#issuecomment-413082258, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGB73d-laidVlRTVZ86NHyfyTxQi4kqks5uQ5L_gaJpZM4F0lvp .

-- GitHub: https://github.com/alsotang

Aug 16 '18 02:08 alsotang

node-lessons node-lessons copied to clipboard

Lesson3爬虫问题

node-lessons
node-lessons copied to clipboard