Docker options control in worker
Feature request: Control over the default Docker options used in worker containers using zimfarm.config. They're currently hardcoded.
- Being able to add
--gpus allwould make video scrapers fly on workers with a GPU. - Remove
--memory-swappiness=0because it is very, very wrong. (I have to disable cgroup control of swap) - Set the CPU scheduler options to so that scrapers are limited to a fair share of resources.
I am in favor of this, and this would be more manageable than proxying every variable, but to some extent only. The main idea is that we want control over the resources. I know it might seem unreasonable from your worker admin POV but we designed it with the least amount of maintenance (ie. simplicity) and dedicated workers.
We agreed that it's for the best to be more flexible though. Couple things we need to be careful about:
- keeping it dead easy to create a worker
- CPU needs to to though through to be both explicit (it's not currently) and flexible. #753 shall guide it. I suppose you've looked at
ZIMFARM_TASK_CPUSandZIMFARM_TASK_CPUSET? - we use the zimfarm as a feedback loop: OOM helps us determine the needs and monitor memory usage increases in the scrapers. It's bad pattern but it's the cheapest one we have. Swap hides a lot of this away.
--memory-swappiness=0 does disable data swap (docs) but it causes intense paging and runtime GC before OOM.
I think the memory parameters are perfect for the configuration file. They can be set to enforce hard recipe limits on dedicated internal workers but left blank or set differently on external workers. Scrapers have a lot of idle data (looks like 70%) going on for days or weeks.
Implementation detail - Big scrapers and the ZFS filesystem cache compete with each other on pixelmemory. Some scrapers use tens of GB of filesystem cache so it's all much faster without idle data in RAM. That's not just the scrapers, but the server's file serving duties too.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.