wp1 icon indicating copy to clipboard operation
wp1 copied to clipboard

Task requested on farm.zimit.kiwix.org with too much memory

Open benoit74 opened this issue 10 months ago • 8 comments

We have a task on farm.zimit.kiwix.org which requests more memory than mwoffliner workers have available:

Image

Image

I removed the task on the farm.

Is it an issue to fix on WP1 side or was this just a test?

benoit74 avatar Jan 27 '25 10:01 benoit74

Here's the logic that WP1 uses for requesting resources:

https://github.com/openzim/wp1/blob/main/wp1/logic/selection.py#L187

I don't remember how that was determined, but you can see that it's variable.

audiodude avatar Jan 27 '25 16:01 audiodude

To me it seems that this selection would have more than 5M of lines! We should not propose to do that. This infra is not conceived to make very large ZIM files.

kelson42 avatar Jan 27 '25 21:01 kelson42

At least, the resources should be capped to worker resources.

benoit74 avatar Jan 27 '25 22:01 benoit74

Should we get rid of the dynamic resource sizing attempt and just request max resources every time?

What are reasonable limits for the number of articles we allow for a ZIM? I think the UI would need some updating if we were to establish such limits.

audiodude avatar Feb 24 '25 16:02 audiodude

We have currently two tasks "abusing" (the word is a bit strong) our infrastructure:

  • https://farm.zimit.kiwix.org/pipeline/6123148d-1d10-4753-9b11-25101fcd606e/debug : 443497 articles
  • https://farm.zimit.kiwix.org/pipeline/65334923-5200-40d2-a58d-07c34ab6a0a9/debug : 242146 articles, 534990 files

I think that we should put a cap on number of articles in the selection, aiming at the 2 hours limit we already have set for zimit.kiwix.org (a basic rule of thumb is probably pretty easy to compute). This limit should ideally be set per user, so that we can grant ourselves more freedom, and that so that we can sell more time/articles to the ones who are ready to pay for it.

Given the difficulties we have to update wikipedia ZIMs currently, it is probably even wise to do it quite quickly, sooner or later someone will have the idea of using wp1 to create his own updated ZIM.

benoit74 avatar Mar 02 '25 07:03 benoit74

To ke this issue is the "sister" of #794 and should be fixed in priority. The message should invite users to open a request to openzim/zim-requests.

kelson42 avatar Mar 23 '25 08:03 kelson42

To ke this issue is the "sister" of #794 and should be fixed in priority. The message should invite users to open a request to openzim/zim-requests.

Yes these issues are closely related.

The comment on the other bug (https://github.com/openzim/wp1/issues/794#issuecomment-2701846868) is not quite accurate. It assumes that we need to increase the memory request by ~20% because 1.14 uses 20% more memory. However, the real situation is that WP1 probably never actually processed large selections of the size being requested here. So it's not clear that even if we requested the maximum resources available every time, we would be able to scrape all of the selections that are requested (which is why we also need a solution to this issue).

audiodude avatar Mar 23 '25 16:03 audiodude

Defining fair usage of WP1 is even more urgent than fixing this issue

benoit74 avatar Mar 24 '25 09:03 benoit74