MediaCrawler issues

将playwright的逻辑从爬虫业务逻辑中抽离出去

5

目前的代码实现是强依赖playwright的，导致在linux系统上部署不方便，计划将playwright抽离出去，单独提供浏览器环境的功能。

NanmiCoder

enhancement

小红书主页爬取用户笔记超过几十条后出现错误

1

MediaCrawler ERROR [XiaoHongShuCrawler.get_note_detail] Get note detail error: required param: source_note_id not found 好像需要额外的参数了

insomniai

小红书主页 type=creator时，如何只抓取content

2

如题，当小红书type为创作者时，默认会抓取3个内容。 1. BUG: 主页内容只能抓取到30条，从第30条开始都重复了。默认分页是30，我尝试改了一下60，无效。只能30条？ 2. help : 评论 comments太多了，导致一个人的主页会抓取20多分钟......不要评论，启动参数应该怎么样设置跳过comments呢？ ![image](https://github.com/NanmiCoder/MediaCrawler/assets/13759936/4e64148b-e78e-4776-8961-9eb02e0ab518) ![image](https://github.com/NanmiCoder/MediaCrawler/assets/13759936/d728ef78-cfba-46ea-8f77-2b67d8c6abd3) 启动参数,修改config这2个值后启动：` python main.py` ![image](https://github.com/NanmiCoder/MediaCrawler/assets/13759936/4403dc37-efe0-4b6d-9dc4-3ca99f9919ee)

ifredom

试了一次就封号了。。。。

1

yanhaishixian

请问MediaCrawler\store\douyin\douyin_store_impl.py里异步函数save_data_to_json为什么要使用锁

2

同一文件中的异步函数save_data_to_csv并没有使用锁且正常运行，为什么数据保存类型为json时需要使用锁。因为我发现随着json文件逐渐变大，爬取数据的速度开始急剧下降，所以观察了一下这部分代码 ```python async with self.lock: if os.path.exists(save_file_name): async with aiofiles.open(save_file_name, 'r', encoding='utf-8') as file: save_data = json.loads(await file.read()) save_data.append(save_item) async with aiofiles.open(save_file_name, 'w', encoding='utf-8') as file: await file.write(json.dumps(save_data, ensure_ascii=False))...

shimada-hanzo

[bug]抖音不行

1

![image](https://github.com/NanmiCoder/MediaCrawler/assets/38212718/f74b4825-fe34-4066-9b1e-27d739e6d594) python main.py --platform dy --lt qrcode --type detail

fadeawaylove

ValueError: Set of coroutines/Futures is empty.

1

(thor) E:\MediaCrawler-main>python main.py --platform dy --lt qrcode --type search 2024-03-12 22:06:09 MediaCrawler INFO [DouYinCrawler.search] Begin search douyin keywords 2024-03-12 22:06:09 MediaCrawler INFO [DouYinCrawler.search] Current keyword: 心灵抚慰 2024-03-12 22:06:11 httpx INFO...

qwertyuiopasdfghjklzxcn

小红书关键字搜索

1

博主，你好。怎么配置按照关键字搜索呢。

Rollines

可以根据query对b站的视频进行链接爬取，并下载视频吗

3

tianguang2525

微博评论数据爬取失败

4

评论数据爬取失败，是什么问题呢？以下是日志输出内容： MediaCrawler ERROR [WeiboCrawler.get_note_comments] may be been blocked, err:Expecting value: line 1 column

ly1327

MediaCrawler
MediaCrawler copied to clipboard

Metadata

将playwright的逻辑从爬虫业务逻辑中抽离出去

小红书主页爬取用户笔记超过几十条后出现错误

小红书主页 type=creator时，如何只抓取content

试了一次就封号了。。。。

请问MediaCrawler\store\douyin\douyin_store_impl.py里异步函数save_data_to_json为什么要使用锁

[bug]抖音不行

ValueError: Set of coroutines/Futures is empty.

小红书关键字搜索

可以根据query对b站的视频进行链接爬取，并下载视频吗

微博评论数据爬取失败

← Metadata

Owner

Metadata

MediaCrawler MediaCrawler copied to clipboard

Metadata

← Metadata

Owner

Metadata

MediaCrawler
MediaCrawler copied to clipboard