scrapydweb Hard to visit the Timer Tasks page when there are lots of history

爬虫情况描述： 1，我有28个定时爬虫，有的设置为10秒启动一次。累积到现在，有272071 条历史爬虫任务。 2，有的任务是redis_scrapy分布式任务，到现在也累计运行2个月，爬取item操作1000万条。

碰到的问题： 1，点击左侧菜单“timer tasks”，打开很慢，而且会有大概率崩溃。只能重新启动scrapydweb服务。

猜想的原因：可能是累积的历史任务太多，导致打开sqlite数据表很慢。

建议：增加可以一键清除历史记录的功能。

现在我只能每次到一定量后，重新安装程序；再设置所有爬虫，非常耗时。

Mar 05 '20 02:03 devxiaosong

Thanks for your suggestion!

Before this feature is supported, you can edit the SQLite db file directly, or set up the DATABASE_URL option to use other backends like MySQL. https://github.com/my8100/scrapydweb/blob/fbb0b42f94a0d52308ba27e97d1d04f1662c52aa/scrapydweb/default_settings.py#L348-L358

Mar 05 '20 04:03 my8100

👌 手工删除日志数据。

还有一部分数据，抓取的数据统计图（progress visualization）的数据是保存在哪里的？没找到。

Mar 05 '20 08:03 devxiaosong

They are saved in scrapydweb/data/stats You can also set BACKUP_STATS_JSON_FILE to False.

https://github.com/my8100/scrapydweb/blob/fbb0b42f94a0d52308ba27e97d1d04f1662c52aa/scrapydweb/default_settings.py#L107-L112

Mar 05 '20 13:03 my8100

BACKUP_STATS_JSON_FILE to False ，设置后是完全不保留统计数据，还是有个定时清理机制？

Mar 06 '20 07:03 devxiaosong

No stats files would be saved when BACKUP_STATS_JSON_FILE = False

Mar 06 '20 13:03 my8100

实话实话，换数据库根本就不能解决问题，还是卡，一年了你还没修这个bug

May 22 '20 07:05 xlpxxx

建议： 1.定期TRUNCATE task_result表 2.或者定期（每分钟）统计task_result，存入新表，‘Timer tasks'页面的 task results从新表获取

在以下情况下，不会保存任何统计信息文件 BACKUP_STATS_JSON_FILE = False

1.定期清掉task_result表 2.或者定期（每分钟）统计task_result，存入新表，‘Timer tasks'页面的 task results可以从新表获取

Apr 22 '21 02:04 jun-ge

我已经切换到外部数据了，定期清理一下，没办法，不清理后面就直接打不开了。

May 11 '21 09:05 ace-express