Expose container-level containerProperties configuration (IPC mode, ulimits, sharedMemorySize, linuxParameters) in @batch decorator
We are running Metaflow on AWS Batch and recently hit a limitation when deploying GPU-based Graph Neural Network workloads (PyG). PyG requires larger shared memory (/dev/shm) and specific ulimits to avoid Bus Errors during large graph allocations.
These require Docker flags such as:
--ipc=host
--ulimit memlock=-1
--ulimit stack=67108864
The exact warning I get when running.
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for PyG. NVIDIA recommends the use of the following flags: docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
Currently, the batch decorator does not expose any way to configure these container-level properties. Inspecting batch_client.py, only max_swap and swappiness are exposed via linuxParameters, with no escape hatch to modify the full containerProperties.
This makes certain ML workloads (e.g., PyG, DGL, large CUDA graphs, RAPIDS) impossible to run without modifying Metaflow internals.
Requested Feature Expose either of the following:
Option A: Direct kwargs pass-through Allow batch(container_properties=...) to merge into the AWS Batch Job Definition.
Option B: Escape hatch Something like:
batch(
gpu=1,
image="xyz",
ecs_overrides={
"containerProperties": {...}
}
)
Why this matters Graph-based workloads, high-memory CUDA ops, and frameworks like PyG require larger SHMEM and specific ulimits. Without these, Metaflow cannot support a class of real-world ML tasks that rely on Batch GPU compute.
I’m happy to submit a PR once there’s agreement on the API design.
thanks for opening the issue. option a looks most promising - would you be open to a PR? depending on the options being introduced, we can quickly iterate on the UX after the PR exists - my expectation is that changing the UX shouldn't be much work.