Limit the size of the selections we can ZIM through WP1
While WP1 usage has been very limited / confidential so far, it begins to get some traction and we begin to see very big selections being requested.
WP1 being so far a free tool, we cannot accept to use unlimited resources for unlimited size.
We need to:
- define a limit of acceptable selection size which can be ZIMed
- a guess estimate is that we process around 400 to 500 articles per minute on wikipedia.org projects, and since this is our most popular project, I propose to limit to 50k articles (more or less 2 hours), which has the advantage to be also the limit for our top 50k selections
- define a message we want to display when this limit is hit
- @Popolechien we need you input on this
- we've agreed that for now we will redirect users to create a zim-request when they are blocked, we do not know yet if this should/could become a paid offer or just wait for the zim-request to be processed for free (which might take too long for some users) or just ask more resources to the foundation for a dedicated worker
- block users requesting a ZIM bigger than this at API + UI level with this message
Here's some proposed text:
Oh no! It seems that you have hit the limit for the maximum number of articles (50,000) :-(
Could it be that such a big selection could be useful to others? How about opening a zim-request so that we look into it, and add straight to the Kiwix library?
Shouldn't we add the limit in the text so that users know what to aim for ?
With a limit at 50k articles I would assume that changes won't be made at the 1-2 units margin, but ok. Edited the text a bit.
@benoit74 Beside #790 and #794, this issue is indeed part of the solution triptic to fix our memory mgmt regression at WP1. I hope we can implement them next week.
I don't think I'm the good person to implement this, to me this is something to be done at WP1 level and I have absolutely no knowledge of the code. Since this needs both UI and backend changes ...
@audiodude @benoit74 @rgaudin We have currently a pretty broken WP1 ZIM building feature since weeks meanwhile. My input is: we have to find a solution to fix it very soon. This is a priority.
@kelson42 @benoit74 I am actively working on this issue.
I've merged #818 and deployed it, so now all selections have an s_article_count column that lists the number of articles in the selection, which was backfilled by a database migration and is updated any time the selection is built.
Just for fun....
MariaDB [enwp10_prod]> select s_article_count from selections order by s_article_count desc limit 10;
+-----------------+
| s_article_count |
+-----------------+
| 6900621 |
| 5229385 |
| 3714599 |
| 2911663 |
| 2911336 |
| 2883293 |
| 2862600 |
| 2845189 |
| 513356 |
| 400581 |
+-----------------+
10 rows in set (0.002 sec)
MariaDB [enwp10_prod]>
EDIT: But this is somewhat encouraging
MariaDB [enwp10_prod]> select count(*) from selections;
+----------+
| count(*) |
+----------+
| 397 |
+----------+
1 row in set (0.001 sec)
MariaDB [enwp10_prod]> select count(*) from selections where s_article_count <= 50000;
+----------+
| count(*) |
+----------+
| 292 |
+----------+
1 row in set (0.001 sec)
MariaDB [enwp10_prod]>
The next step is to add limitations in the backend API to refuse to create ZIMs for selections that exceed the article limit, combined with frontend treatment for messaging this to the user. That can likely be done in the same PR.
Cool !