dataharvest
dataharvest copied to clipboard
XiaoHongShuSpider不知道怎么用?
如题,换成马蜂窝的爬虫也似乎没爬到任何东西,这个要怎么用呀? 2024-09-01 23:47:53,010 - INFO - HTTP Request: GET https://www.mafengwo.cn/mdd "HTTP/1.1 301 Moved Permanently" [ERROR][2024-09-01 23:47:53][main.py:439] - Error occurred while crawling: '__jsluid_s' INFO: 127.0.0.1:53710 - "POST /fetch_mfw HTTP/1.1" 200 OK
@app.post("/fetch_mfw")
async def crawl_mafengwo_mdd():
url = "https://www.mafengwo.cn/mdd"
# proxy_gene_func = MyProxy()
# config = SpiderConfig(proxy_gene_func=proxy_gene_func)
config = SpiderConfig()
# 使用 XiaoHongShuSpider
spider = MaFengWoSpider(config)
try:
# 使用异步方法抓取网页内容
doc = await spider.a_crawl(url)
logger.info(f"Successfully crawled content: {doc.page_content}")
return doc.page_content
except Exception as e:
logger.error(f"Error occurred while crawling: {str(e)}")
return {"error": str(e)}
新增了小红书的demo 在tests里面可以看一下