Integrate zimfarm dev setup
Rationale
As part of cleanup in zimfarm API (openzim/zimfarm#1391), requests to create recipes/tasks now require an offliner definition version. This PR sets the version of the offliner definition from env variable and sets up zimfarm containers in a docker-compose graph. Previously, the API used "initial" as the definition versions but as scrapers evolve and arguments change, the definitions change too.
Changes
- use mwoffliner definition version from env (default to image tag)
- set up compose graph that includes zimfarm-containers. These are created with profiles: zimfarm and zimfarm-worker. The former starts up only the API and UI while the latter starts up the worker and receiver in addition.
Codecov Report
:x: Patch coverage is 88.63636% with 5 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 92.87%. Comparing base (63b7a74) to head (544ecb6).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| wp1/logic/builder.py | 88.00% | 3 Missing :warning: |
| wp1/zimfarm.py | 88.88% | 2 Missing :warning: |
:x: Your patch check has failed because the patch coverage (88.63%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.
Additional details and impacted files
@@ Coverage Diff @@
## main #1027 +/- ##
==========================================
- Coverage 92.90% 92.87% -0.04%
==========================================
Files 73 73
Lines 4229 4238 +9
==========================================
+ Hits 3929 3936 +7
- Misses 300 302 +2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
We should really run this from end-to-end to ensure this setup works correctly.
Yes I can definitely help with that. I'll patch this PR and try setting up/running the zimfarm locally and confirm that I can create and download ZIMs.
Updated the files with the recent changes:
- added separate buckets for artifacts, logs and zims
- updated the README to detail the worker resources and reason for the offliner definition
- updated worker resources to 3 CPU, 20G RAM, 20G disk
Code LGTM, waiting for e2e test from @audiodude (if I get it correctly) to give my formal approval
I made some minor tweaks to the PR, but it's still not working. My Zimfarm is still reporting the following for requests to http://localhost:8004/v2/schedules:
{"success":false,"message":"Offliner definition for offliner mwoffliner with version 1.17.2 does not exist"}
EDIT: This is after following the directions in the README and updating my local credentials.py
Hum, this is indeed a problem. To unblock you, please set 'definition_version': 'dev' in your local credentials.py, it should do the trick.
It is however not the proper way to solve this situation to merge this PR. We will continuously have new offliner definitions arriving, and all of them should be stored in the local Zimfarm DB so that dev can use mostly any mwoffliner version / definition version. I feel like the docker/zimfarm/create_offliners.sh should fetch all existing definitions from api.farm.openzim.org and populate the ones missing in local dev DB. Documentation would then state that developers should rerun this script on a regular basis to fetch new offliner definitions if they want to use them in their credentials.py.
Okay, with the workaround I can successfully create schedules and schedule tasks.
However my tasks seem stuck:
It looks like I have a worker, but it was "last seen 12 minutes ago"?
However my tasks seem stuck:
That's probably because the worker doesn't have enough resources to run the task
@elfkuzco can you try to reproduce @audiodude issue and confirm it can be solved with more resources to the worker? I don't get what is missing, resources seems to be sufficient.
I've pulled the latest zimfarm images and my jobs are still stuck.
I've pulled the latest zimfarm images and my jobs are still stuck.
can i see your worker logs?
can i see your worker logs?
Where do I find those?
maybe docker logs -f <worker-container> in a different shell
which one is the worker container? I have:
tmoney@tmoney-linux:~/code/wp1/wp1-frontend$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4e88bcacd96f ghcr.io/openzim/zimfarm-ui:latest "/docker-entrypoint.…" 30 minutes ago Up 29 minutes 127.0.0.1:8003->80/tcp zimfarm-ui
21311c6dd01e wp1-dev-dev-workers "/bin/sh -c 'supervi…" 30 minutes ago Up 30 minutes wp1bot-workers-dev
702158e27b52 ghcr.io/openzim/zimfarm-backend:latest "uvicorn zimfarm_bac…" 30 minutes ago Up 30 minutes (healthy) 127.0.0.1:8004->80/tcp zimfarm-api
08aeb6538488 postgres:17.3-bookworm "docker-entrypoint.s…" 30 minutes ago Up 30 minutes (healthy) 127.0.0.1:2345->5432/tcp zimfarm-postgresdb
af5573efe297 wp1-dev-dev-database "docker-entrypoint.s…" 30 minutes ago Up 30 minutes 0.0.0.0:6300->3306/tcp, [::]:6300->3306/tcp wp1bot-db-dev
a11a4bd190ef redis "docker-entrypoint.s…" 30 minutes ago Up 30 minutes (healthy) 0.0.0.0:9736->6379/tcp, [::]:9736->6379/tcp wp1bot-redis-dev
d7075c2628f6 minio/minio "/usr/bin/docker-ent…" 4 days ago Up 4 days (healthy) 0.0.0.0:9000-9001->9000-9001/tcp, [::]:9000-9001->9000-9001/tcp wp1bot-minio-dev
1d466e07260f mariadb:10.4 "docker-entrypoint.s…" 19 months ago Up 2 weeks 0.0.0.0:6600->3306/tcp, [::]:6600->3306/tcp wp1bot-test-db
7f06c4c77a50 5b0542ad1e77 "docker-entrypoint.s…" 19 months ago Up 2 weeks 0.0.0.0:9777->6379/tcp, [::]:9777->6379/tcp wp1bot-test-redis
There doesn't appear to be a worker container running in the list. From the compose file, the name should be zimfarm-worker-manager
can you do docker logs -f zimfarm-worker-manager. My guess is that it died for some reason. Also, did you start the services with the zimfarm-worker docker profile. i.e docker compose -f docker-compose-dev.yml --profile zimfarm --profile zimfarm-worker up --pull always --build
This is the command I used: docker compose -f docker-compose-dev.yml --profile zimfarm --profile zimfarm-worker up --pull always --build -d
Here are the logs:
tmoney@tmoney-linux:~/code/wp1/wp1-frontend$ docker logs zimfarm-worker-manager
[2025-11-03 19:46:36,061: INFO] starting zimfarm worker-manager.
[2025-11-03 19:46:36,061: INFO] configuration:
username=test_worker
webapi_uris=['http://zimfarm-api:80/v2']
workdir=/data
worker_name=test_worker
OFFLINERS=['mwoffliner', 'youtube', 'phet', 'gutenberg', 'sotoki', 'nautilus', 'ted', 'openedx', 'zimit', 'kolibri', 'wikihow', 'ifixit', 'freecodecamp', 'devdocs', 'mindtouch']
PLATFORMS_TASKS={}
poll_interval=10
sleep_interval=5
selfish=False
[2025-11-03 19:46:36,061: INFO] testing workdir at /data…
[2025-11-03 19:46:36,061: INFO] workdir is available and writable
[2025-11-03 19:46:36,061: INFO] testing private key at /etc/ssh/keys/zimfarm…
[2025-11-03 19:46:36,061: CRITICAL] private key is not a readable path
Okay I think I know the problem. In the first step in the README, when I initially create the Docker graph, this path doesn't exist: ./docker/zimfarm/id_ed25519.
I've encountered this before, but at that point Docker creates that path as a directory. Then, when we run the create_worker script, it can't overwrite the directory with the private key.
Yes. Oddly enough, it happened to me too. Would update the docs to prevent this from happening to anyone else.
Just want to make sure. Is this line in the docker-compose file supposed to map a file to a file, or a directory to a directory?
volumes:
- ./docker/zimfarm/id_ed25519:/etc/ssh/keys/zimfarm
If it's meant to map a file, we should simply do a touch docker/zimfarm/id_ed255519 before we start the first docker graph, so that it is initially mapped as an (empty) file that can then be overwritten. Also, I didn't even notice the line in the script that said "now copy the key blah blah". Can we just mv the key ourselves to that location within the script?
It's supposed to map to a file. I will revise the shell script to mv the key to that path.
Okay my tasks are being picked up by the worker now! But they are failing. I see this in "Scraper stderr":
[error] [2025-11-03T21:16:58.480Z] Failed to run mwoffliner after [0s]:
Error: Unknown S3 region set
at S3.setRegion (/tmp/mwoffliner/src/S3.ts:37:13)
at new S3 (/tmp/mwoffliner/src/S3.ts:26:10)
at Module.execute (/tmp/mwoffliner/src/mwoffliner.lib.ts:149:13)
at <anonymous> (/tmp/mwoffliner/src/cli.ts:66:8)
I assume it's because the optimization cache URL I'm sending in is https://localhost:9000/?keyId=minio_key&secretAccessKey=minio_secret&bucketName=org-kiwix-dev-cache and it's trying to parse a region from the hostname?
EDIT: If so, I understand that this is an issue for the mwoffliner repo, of course.
Can you use one similar to the minio one configured for the uploader?
EDIT: If so, I understand that this is an issue for the mwoffliner repo, of course.
The thing is the container can't access localhost. You can use https://minio..... because the container can resolve the hostname minio since they all share the same network
Or if you want, you can omit the optimization URL from your task.
Okay I definitely think we can skip the S3 cache for dev scraping. After I got rid of that, I got a new error from mwoffliner, which was:
Failed to read articleList from [http://localhost:5000/v1/builders/0b76807e-c1e3-44c0-a815-b0e8405a51e8/selection/latest.tsv]
This makes sense, since the worker is running inside of the docker compose network, while my WP1 web/api/backend is running on the host machine. In fact, this is the exact reason we need to have a zimfarm in dev anyways, because we've changed the logic for the ZIM creation to use a dynamic URL from WP1 itself rather than a static file list on S3.
I think at this point, I'm going to start working on putting the dev backend server into the docker compose graph as well, with all the updates to configuration and README that are required for that. I'd like to use this same PR and then just merge the whole thing once we have a working, consistent dev environment.
@benoit74 @elfkuzco WDYT?
I agree with you.
Yes for dev we should skip the S3 cache, we will not gain much besides pain. And this is more an internal detail to mwoffliner operation, not really needed.
I like the idea of adding the backend to the docker graph in same PR. This is a great opportunity to nail down this dev setup issues and have a reproducible setup devs can use from e2e. No more excuses for not testing stuff once in a while from e2e. Also a great asset in term of documentation / learning base.
I would even suggest to also add web and api to the docker graph. With proper mount point and configuration it should be possible to have hot reload whenever dev changes something in the codebase, at least this is what we achieved to have in zimfarm, zimit-frontend and cms repos, and it is (mostly?) totally transparent in terms of performances. It free the developers from having anything to install on their dev machine besides Docker, and ensures there is no headaches due to bad versions and stuff like that. Quite important for everyone which is not a core maintainer and / or a bit lazy to setup stuff correctly on his machine (which includes myself ^^)
Okay I've got the following in my docker:
^Ctmoney@tmoney-linux:~/code/wp1$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e5a3e2025bc8 wp1-dev-dev-web "flask --app wp1.web…" 5 days ago Up 6 minutes 0.0.0.0:5000->5000/tcp, [::]:5000->5000/tcp wp1bot-web-dev
a133aa726bb8 ghcr.io/openzim/zimfarm-worker-manager:latest "worker-manager --we…" 6 days ago Up 6 days zimfarm-worker-manager
978375c302b7 ghcr.io/openzim/zimfarm-ui:latest "/docker-entrypoint.…" 6 days ago Up 6 days 127.0.0.1:8003->80/tcp zimfarm-ui
055a203f63ff wp1-dev-dev-workers "/bin/sh -c 'supervi…" 6 days ago Up 5 minutes wp1bot-workers-dev
5294b71f64f6 ghcr.io/openzim/zimfarm-backend:latest "uvicorn zimfarm_bac…" 6 days ago Up 6 days (healthy) 127.0.0.1:8004->80/tcp zimfarm-api
590f8488d6f7 minio/minio "/usr/bin/docker-ent…" 6 days ago Up 6 minutes (healthy) 0.0.0.0:9000-9001->9000-9001/tcp, [::]:9000-9001->9000-9001/tcp wp1bot-minio-dev
343148f4b8bc postgres:17.3-bookworm "docker-entrypoint.s…" 6 days ago Up 6 days (healthy) 127.0.0.1:2345->5432/tcp zimfarm-postgresdb
92261129c194 redis "docker-entrypoint.s…" 6 days ago Up 6 minutes (healthy) 0.0.0.0:9736->6379/tcp, [::]:9736->6379/tcp wp1bot-redis-dev
1f0cd8e54a2f wp1-dev-dev-database "docker-entrypoint.s…" 6 days ago Up 6 minutes 0.0.0.0:6300->3306/tcp, [::]:6300->3306/tcp wp1bot-db-dev
1d466e07260f mariadb:10.4 "docker-entrypoint.s…" 20 months ago Up 3 weeks 0.0.0.0:6600->3306/tcp, [::]:6600->3306/tcp wp1bot-test-db
7f06c4c77a50 5b0542ad1e77 "docker-entrypoint.s…" 20 months ago Up 3 weeks 0.0.0.0:9777->6379/tcp, [::]:9777->6379/tcp wp1bot-test-redis
I've changed the URL for the article list we send to Zimfarm to try and use the WP1 API that's running in docker, so I'm using http://web-dev:5000/v1/builders/94330657-fe26-4aea-8f14-f959ede293a0/selection/latest.tsv. But I get the following error:
[error] [2025-11-10T00:17:35.346Z] Failed to read articleList from [http://web-dev:5000/v1/builders/94330657-fe26-4aea-8f14-f959ede293a0/selection/latest.tsv] Error: Failed to read articleList from URL: http://web-dev:5000/v1/builders/94330657-fe26-4aea-8f14-f959ede293a0/selection/latest.tsv
I understand that this is a network connectivity issue, and I need to use the right domain for the WP1 API. However, the part I don't understand is the network topology for worker/worker-manger/mwoffliner/etc and where the mwoffliner is actually running on the network. What should I put for http://web-dev:5000? Thanks!
Also tried with wp1bot-web-dev:
[error] [2025-11-10T02:41:00.706Z] Failed to read articleList from [http://wp1bot-web-dev:5000/v1/builders/6a1f2ee7-5947-4222-8e12-b043cf376af4/selection/latest.tsv] Error: Failed to read articleList from URL: http://wp1bot-web-dev:5000/v1/builders/6a1f2ee7-5947-4222-8e12-b043cf376af4/selection/latest.tsv
It's reachable from zimfarm-api:
tmoney@tmoney-linux:~/code/wp1$ docker exec -it zimfarm-api bash
root@5294b71f64f6:/# curl http://wp1bot-web-dev:5000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>WP 1.0 API</title>
....<SNIP>