dataharvest icon indicating copy to clipboard operation
dataharvest copied to clipboard

XiaoHongShuSpider不知道怎么用?

Open bobkingdom opened this issue 1 year ago • 1 comments

如题,换成马蜂窝的爬虫也似乎没爬到任何东西,这个要怎么用呀? 2024-09-01 23:47:53,010 - INFO - HTTP Request: GET https://www.mafengwo.cn/mdd "HTTP/1.1 301 Moved Permanently" [ERROR][2024-09-01 23:47:53][main.py:439] - Error occurred while crawling: '__jsluid_s' INFO: 127.0.0.1:53710 - "POST /fetch_mfw HTTP/1.1" 200 OK



@app.post("/fetch_mfw")
async def crawl_mafengwo_mdd():
    url = "https://www.mafengwo.cn/mdd"
    # proxy_gene_func = MyProxy()
    # config = SpiderConfig(proxy_gene_func=proxy_gene_func)
    config = SpiderConfig()
    # 使用 XiaoHongShuSpider
    spider = MaFengWoSpider(config)
   
    try:
        # 使用异步方法抓取网页内容
        doc = await spider.a_crawl(url)
        logger.info(f"Successfully crawled content: {doc.page_content}")
        
       
        return doc.page_content
    
    except Exception as e:
        logger.error(f"Error occurred while crawling: {str(e)}")
        return {"error": str(e)}

bobkingdom avatar Sep 01 '24 15:09 bobkingdom

新增了小红书的demo 在tests里面可以看一下

yuvenhol avatar Sep 20 '24 03:09 yuvenhol