metaflow
metaflow copied to clipboard
`metaflow_environment` dependencies can override or conflict with those set by the Batch docker image, breaking user code
Pasting the README from runsascoded/mf-pip-issue, where I have some repro files as well:
Metaflow/pip/Batch issue
Metaflow runs pip install awscli … boto3 while setting up task environements in Batch, which can break aiobotocore<2.1.0.
Repro
Docker image runsascoded/mf-pip-issue-batch (batch.dockerfile) pins recent versions of botocore and aiobotocore:
aiobotocore==1.4.2(October 5, 2021)botocore==1.20.106(July 6, 2021, required byaiobotocore==1.4.2)
Local mode: ✅
They work fine together normally; runsascoded/mf-pip-issue-local (local.dockerfile) runs s3_flow_test.py successfully (in "local" mode):
docker run -it --rm runsascoded/mf-pip-issue-local
# Metaflow 2.4.8 executing S3FlowTest for user:user
# …
# 2022-01-16 21:21:59.162 Done!
Batch mode: ❌
However, with a Metaflow Batch queue configured:
python s3_flow_test.py run --with batch:image=runsascoded/mf-pip-issue-batch
fails with:
AttributeError: 'AioClientCreator' object has no attribute '_register_lazy_block_unknown_fips_pseudo_regions'
due to a version mismatch (botocore>=1.23.0, aiobotocore<2.1.0).
Version mismatch
botocore removed ClientCreator._register_lazy_block_unknown_fips_pseudo_regions in 1.23.0, and aiobotocore only updated to botocore>=1.23.0 in 2.1.0, so aiobotocore<2.1.0 requires botocore<1.23.0, otherwise reading from S3 via Pandas will raise this error.
Cause
The version mismatch is caused by Metaflow running pip install awscli … boto3 while setting up the task environment (in Batch and I believe k8s). If awscli or boto3 aren't both installed already, it will pick a recent version to install, see that a recent botocore is also required by that version, and update botocore to >=1.23.0 while aiobotocore is still <2.1.0, breaking Pandas→S3 reading.
Simpler example
Here we see pip install awscli break aiobotocore<2.1.0 directly (in the same image as above):
docker run --rm --entrypoint bash runsascoded/mf-pip-issue-batch -c '
echo "Before \`pip install awscli\`:" && \
pip list | grep botocore && \
pip install awscli -qqq && \
echo -e "----\nAfter \`pip install awscli\`:" && \
pip list | grep botocore
' 2>/dev/null
# Before `pip install awscli`:
# aiobotocore 1.4.2 # ✅
# botocore 1.20.106 # ✅
# ----
# After `pip install awscli`:
# aiobotocore 1.4.2 # ✅
# botocore 1.23.37 # ❌
Here, pip install awscli upgraded botocore to a version that's incompatible with the already-installed aiobotocore.
Workaround
The simplest workaround I've found is to ensure Metaflow's pip install awscli click requests boto3 command no-ops, by having some version of those libraries already installed in the image. They should also have consistent transitive dependency versions, otherwise pip install will "help" with those as well).
Scratch
These seem like the minimal Metaflow configs to submit to Batch (and reproduce the issue):
{
"METAFLOW_BATCH_JOB_QUEUE": "arn:aws:batch:…",
"METAFLOW_ECS_S3_ACCESS_IAM_ROLE": "arn:aws:iam::…",
"METAFLOW_DEFAULT_DATASTORE": "s3",
"METAFLOW_DATASTORE_SYSROOT_S3": "s3://<bucket>/metaflow",
"METAFLOW_DATATOOLS_SYSROOT_S3": "s3://<bucket>/data"
}
Docker build commands:
docker build -f batch.dockerfile -t runsascoded/mf-pip-issue-batch .
docker build -f local.dockerfile -t runsascoded/mf-pip-issue-local .
@ryan-williams The pip install awscli ... should be a no-op for any of the libraries that are already present in the image.
Yes, but if e.g. awscli isn't already installed, installing it can change the versions of things that are already installed, including breaking them. The "Simpler example" section above illustrates this most directly.
To be clear, it's possible for the following to happen:
- user builds image with valid
*boto*versions - user sets that image as
$METAFLOW_BATCH_CONTAINER_IMAGE, runs a flow--with batch - flow fails because boto versions in the step environment are broken:
- before running the step, Metaflow ran its own
pip installin the container - that
pip installinadvertently changed the versions of things the user had already installed in the image (namelybotocore), resulting in other things the user installed (aiobotocore) being broken
- before running the step, Metaflow ran its own
I don't know what the solution should be, but it is surprising and undesirable behavior, and enabled by a breaking change in boto in November that I suspect we will see wash around the ecosystem for some time to come, so it's good to be aware of this specific interaction with Metaflow's step-env setup logic.
Ran into this again today. Here's an updated link to the offending line, in 2.8.2.
Here's a simple repro:
1. User installs boto/s3fs/pandas, successfully reads CSV from S3
# mf1.dockerfile
FROM python:3.9
WORKDIR /root
RUN pip install \
boto3==1.24.59 \
botocore==1.27.59 \
aiobotocore==2.4.2 \
s3fs==2023.1.0 \
pandas
# ✅ works fine, reads publicly-accessible CSV from S3. boto/s3fs/pandas versions are mutually compatible.
ENTRYPOINT [ "python", "-c", "import pandas as pd; print(pd.read_csv('s3://ctbk/csvs/JC-202301-citibike-tripdata.csv'))" ]
docker build -tmf1 -fmf1.dockerfile .
docker run --rm -it mf1
✅ works fine, prints DataFrame
ride_id rideable_type ... end_lng member_casual
0 0905B18B365C9D20 classic_bike ... -74.044247 member
1 B4F0562B05CB5404 electric_bike ... -74.041664 member
2 5ABF032895F5D87E classic_bike ... -74.042521 member
3 E7E1F9C53976D2F9 classic_bike ... -74.044247 member
4 323165780CA0734B classic_bike ... -74.042884 member
... ... ... ... ... ...
56070 17CD2F4ABD4F6785 classic_bike ... -74.050389 member
56071 D75D12846E6838D0 electric_bike ... -74.050389 member
56072 36387397177CAA80 electric_bike ... -74.050389 member
56073 B66278F45420CFA0 classic_bike ... -74.030305 member
56074 230153A8D1F2D5F7 classic_bike ... -74.030305 member
[56075 rows x 13 columns]
2. Metaflow runs pip install awscli boto3, breaking aiobotocore/s3fs/pandas
# mf2.dockerfile
FROM mf1
RUN pip install awscli boto3 # 💥 this breaks the user's installs; `pd.read_csv("s3://…")` no longer works
Test image:
docker build -tmf2 -fmf2.dockerfile .
docker run --rm -it mf2
❌ pd.read_csv raises PermissionError: Forbidden
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 112, in _error_wrapper
return await func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/aiobotocore/client.py", line 358, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
self._engine = self._make_engine(f, self.engine)
File "/usr/local/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
self.handles = get_handle(
File "/usr/local/lib/python3.9/site-packages/pandas/io/common.py", line 716, in get_handle
ioargs = _get_filepath_or_buffer(
File "/usr/local/lib/python3.9/site-packages/pandas/io/common.py", line 425, in _get_filepath_or_buffer
file_obj = fsspec.open(
File "/usr/local/lib/python3.9/site-packages/fsspec/core.py", line 134, in open
return self.__enter__()
File "/usr/local/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/usr/local/lib/python3.9/site-packages/fsspec/spec.py", line 1135, in open
f = self._open(
File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 649, in _open
return S3File(
File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 2024, in __init__
super().__init__(
File "/usr/local/lib/python3.9/site-packages/fsspec/spec.py", line 1491, in __init__
self.size = self.details["size"]
File "/usr/local/lib/python3.9/site-packages/fsspec/spec.py", line 1504, in details
self._details = self.fs.info(self.path)
File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 114, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 99, in sync
raise return_result
File "/usr/local/lib/python3.9/site-packages/fsspec/asyn.py", line 54, in _runner
result[0] = await coro
File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 1238, in _info
out = await self._call_s3(
File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 339, in _call_s3
return await _error_wrapper(
File "/usr/local/lib/python3.9/site-packages/s3fs/core.py", line 139, in _error_wrapper
raise err
PermissionError: Forbidden
pip install awscli boto3 explicitly logs an ERROR about breaking aiobotocore:
docker run --rm -it --entrypoint pip mf1 install awscli boto3
# …
# ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
# aiobotocore 2.4.2 requires botocore<1.27.60,>=1.27.59, but you have botocore 1.29.110 which is incompatible.
# Successfully installed PyYAML-5.4.1 awscli-1.27.110 boto3-1.26.110 botocore-1.29.110 colorama-0.4.4 docutils-0.16 pyasn1-0.4.8 rsa-4.7.2
Simplest workaround remains to make sure both awscli and boto3 are both installed in any image you pass to Metaflow Batch mode, but Metaflow could/should do something more careful/correct here.