metaflow
metaflow copied to clipboard
Use boto3 instead of awscli python module to copy packages
awscli is superseded by awscliv2 which doesn't support a Python package.
Solution:
- Write an equivalent
aws s3 cppython module/file in Metaflow that uses uses already declared and installedboto3dependency instead of deprecated (and undeclared dependency)python -m awsclito perform the following action: https://github.com/Netflix/metaflow/blob/master/metaflow/environment.py#L90 - Or look into s3op.py get https://github.com/zillow/metaflow/blob/33178350172a55d94f8ba5d8f32d51d64e11b274/metaflow/datatools/s3op.py#L468-L468
Thanks for the issue! Yes indeed, awscli happens to be an undeclared dependency (we should document it). We should look into what is the credible way forward for using awscliv2. s3op.py get wouldn't work because we have a bootstrapping problem - we need to download it from s3.
True about the bootstrapping. I like the minimal dependencies. With boto3 I tested replacing https://github.com/Netflix/metaflow/blob/master/metaflow/environment.py#L90 with:
def get_python_download_s3_file(self, file_url, local_path):
return (
"%s -c \"import boto3; " % self._python()
+ "exec('try:\\n from urlparse import urlparse\\nexcept:\\n from urllib.parse import urlparse');"
+ "parsed = urlparse('%s'); " % file_url
+ "boto3.client('s3').download_file(parsed.netloc, parsed.path.lstrip('/'), '%s')\"" % local_path
)
@savingoyal I've reopened a PR for this because we are seeing issues where if someone has both awscli v1 and v2 installed, the python dependencies are incompatible / break in weird ways. So instead of relying on this implicit dependency on awscli v1, we should just use boto3 to bootstrap.
this is now released in version 2.12.13