amazon-textract-textractor icon indicating copy to clipboard operation
amazon-textract-textractor copied to clipboard

Cryptic CLI error in SageMaker Studio (and probably other role-based environments?)

Open athewsey opened this issue 1 year ago • 1 comments

Hi team,

I was surprised to find today that the below does not work in the default Python notebook kernel of a SageMaker Studio JupyterLab space, when the notebook's IAM execution role has all the necessary permissions:

%pip install amazon-textract-textractor

!textractor start-document-analysis \
    --features LAYOUT --features TABLES \
    --s3-upload-path {s3_upload_uri} \
    --s3-output-path {s3_output_uri} \
    data/my-cool-document.pdf

Actual behaviour

When neither --region-name nor --profile-name are set, the CLI auto-configures the profile to "default", which causes the below error:

Traceback (most recent call last):
  File "/opt/conda/bin/textractor", line 8, in <module>
    sys.exit(textractor_cli())
  File "/opt/conda/lib/python3.10/site-packages/textractor/cli/cli.py", line 347, in textractor_cli
    extractor = Textractor(
  File "/opt/conda/lib/python3.10/site-packages/textractor/textractor.py", line 90, in __init__
    self.session = boto3.session.Session(profile_name=self.profile_name)
  File "/opt/conda/lib/python3.10/site-packages/boto3/session.py", line 90, in __init__
    self._setup_loader()
  File "/opt/conda/lib/python3.10/site-packages/boto3/session.py", line 131, in _setup_loader
    self._loader = self._session.get_component('data_loader')
  File "/opt/conda/lib/python3.10/site-packages/botocore/session.py", line 802, in get_component
    return self._components.get_component(name)
  File "/opt/conda/lib/python3.10/site-packages/botocore/session.py", line 1140, in get_component
    self._components[name] = factory()
  File "/opt/conda/lib/python3.10/site-packages/botocore/session.py", line 199, in <lambda>
    lambda: create_loader(self.get_config_variable('data_path')),
  File "/opt/conda/lib/python3.10/site-packages/botocore/session.py", line 323, in get_config_variable
    return self.get_component('config_store').get_config_variable(
  File "/opt/conda/lib/python3.10/site-packages/botocore/configprovider.py", line 465, in get_config_variable
    return provider.provide()
  File "/opt/conda/lib/python3.10/site-packages/botocore/configprovider.py", line 671, in provide
    value = provider.provide()
  File "/opt/conda/lib/python3.10/site-packages/botocore/configprovider.py", line 761, in provide
    scoped_config = self._session.get_scoped_config()
  File "/opt/conda/lib/python3.10/site-packages/botocore/session.py", line 422, in get_scoped_config
    raise ProfileNotFound(profile=profile_name)
botocore.exceptions.ProfileNotFound: The config profile (default) could not be found

Expected behaviour

In this environment the AWS_REGION environment variable is automatically set, but there are no CLI 'profile's. I suggest a better default behaviour would be to auto-discover the region from environment variables when present (e.g. os.environ.get("AWS_REGION")) and leave the profile alone?

athewsey avatar Apr 12 '24 07:04 athewsey

I agree that this is counter-intuitive, we could do the same thing for extractor = Textractor().

Belval avatar Apr 12 '24 14:04 Belval