sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Getting an FileNotFoundError: [Errno 2] No such file or directory: 'inference.py'

Open Nuwan1654 opened this issue 1 year ago • 8 comments

Describe the bug Probably this is not a Bug, but when I try to deploy the Sagemaker Pytorch model, I am getting a FileNotFoundError: [Errno 2] No such file or directory: 'inference.py' error

To reproduce

  1. Create 'my_model.tar.gz' file according to the instructions on this page
  2. upload to s3 My folder structure as follow
.
├── best.pt
├── coco.yaml
└── code
    ├── common.py
    ├── experimental.py
    ├── general.py
    ├── inference.py
    ├── loss.py
    ├── requirements.txt
    ├── torch_utils.py
    └── yolo.py

  1. Run predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)

Expected behavior successful deploy

Screenshots or logs

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[14], line 1
----> 1 predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)

File ~/.local/lib/python3.8/site-packages/sagemaker/model.py:1260, in Model.deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, explainer_config, **kwargs)
   1257     if self._base_name is not None:
   1258         self._base_name = "-".join((self._base_name, compiled_model_suffix))
-> 1260 self._create_sagemaker_model(
   1261     instance_type, accelerator_type, tags, serverless_inference_config
   1262 )
   1264 serverless_inference_config_dict = (
   1265     serverless_inference_config._to_request_dict() if is_serverless else None
   1266 )
   1267 production_variant = sagemaker.production_variant(
   1268     self.name,
   1269     instance_type,
   (...)
   1275     container_startup_health_check_timeout=container_startup_health_check_timeout,
   1276 )

File ~/.local/lib/python3.8/site-packages/sagemaker/model.py:693, in Model._create_sagemaker_model(self, instance_type, accelerator_type, tags, serverless_inference_config)
    671 def _create_sagemaker_model(
    672     self, instance_type=None, accelerator_type=None, tags=None, serverless_inference_config=None
    673 ):
    674     """Create a SageMaker Model Entity
    675 
    676     Args:
   (...)
    691             not provided in serverless inference. So this is used to find image URIs.
    692     """
--> 693     container_def = self.prepare_container_def(
    694         instance_type,
    695         accelerator_type=accelerator_type,
    696         serverless_inference_config=serverless_inference_config,
    697     )
    699     if not isinstance(self.sagemaker_session, PipelineSession):
    700         # _base_name, model_name are not needed under PipelineSession.
    701         # the model_data may be Pipeline variable
    702         # which may break the _base_name generation
    703         self._ensure_base_name_if_needed(
    704             image_uri=container_def["Image"],
    705             script_uri=self.source_dir,
    706             model_uri=self.model_data,
    707         )

File ~/.local/lib/python3.8/site-packages/sagemaker/pytorch/model.py:298, in PyTorchModel.prepare_container_def(self, instance_type, accelerator_type, serverless_inference_config)
    290     deploy_image = self.serving_image_uri(
    291         region_name,
    292         instance_type,
    293         accelerator_type=accelerator_type,
    294         serverless_inference_config=serverless_inference_config,
    295     )
    297 deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
--> 298 self._upload_code(deploy_key_prefix, repack=self._is_mms_version())
    299 deploy_env = dict(self.env)
    300 deploy_env.update(self._script_mode_env_vars())

File ~/.local/lib/python3.8/site-packages/sagemaker/model.py:626, in Model._upload_code(self, key_prefix, repack)
    611     self.uploaded_code = fw_utils.UploadedCode(
    612         s3_prefix=repacked_model_data, script_name=os.path.basename(self.entry_point)
    613     )
    615 LOGGER.info(
    616     "Repacking model artifact (%s), script artifact "
    617     "(%s), and dependencies (%s) "
   (...)
    623     repacked_model_data,
    624 )
--> 626 utils.repack_model(
    627     inference_script=self.entry_point,
    628     source_directory=self.source_dir,
    629     dependencies=self.dependencies,
    630     model_uri=self.model_data,
    631     repacked_model_uri=repacked_model_data,
    632     sagemaker_session=self.sagemaker_session,
    633     kms_key=self.model_kms_key,
    634 )
    636 self.repacked_model_data = repacked_model_data

File ~/.local/lib/python3.8/site-packages/sagemaker/utils.py:516, in repack_model(inference_script, source_directory, dependencies, model_uri, repacked_model_uri, sagemaker_session, kms_key)
    513 with _tmpdir(directory=local_download_dir) as tmp:
    514     model_dir = _extract_model(model_uri, sagemaker_session, tmp)
--> 516     _create_or_update_code_dir(
    517         model_dir,
    518         inference_script,
    519         source_directory,
    520         dependencies,
    521         sagemaker_session,
    522         tmp,
    523     )
    525     tmp_model_path = os.path.join(tmp, "temp-model.tar.gz")
    526     with tarfile.open(tmp_model_path, mode="w:gz") as t:

File ~/.local/lib/python3.8/site-packages/sagemaker/utils.py:577, in _create_or_update_code_dir(model_dir, inference_script, source_directory, dependencies, sagemaker_session, tmp)
    575     os.mkdir(code_dir)
    576 try:
--> 577     shutil.copy2(inference_script, code_dir)
    578 except FileNotFoundError:
    579     if os.path.exists(os.path.join(code_dir, inference_script)):

File /usr/lib/python3.8/shutil.py:435, in copy2(src, dst, follow_symlinks)
    433 if os.path.isdir(dst):
    434     dst = os.path.join(dst, os.path.basename(src))
--> 435 copyfile(src, dst, follow_symlinks=follow_symlinks)
    436 copystat(src, dst, follow_symlinks=follow_symlinks)
    437 return dst

File /usr/lib/python3.8/shutil.py:264, in copyfile(src, dst, follow_symlinks)
    262     os.symlink(os.readlink(src), dst)
    263 else:
--> 264     with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
    265         # macOS
    266         if _HAS_FCOPYFILE:
    267             try:

FileNotFoundError: [Errno 2] No such file or directory: 'inference.py'
'''

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 2.155.0
- **Framework name:**: PyTorch
- **Framework version**: 2.0.1+cu117
- **Python version**: 3.8.10
- **CPU or GPU**: GPU
- **Custom Docker image (Y/N)**: Y

Any help on this would be highly appreciated.

Nuwan1654 avatar Jul 19 '23 06:07 Nuwan1654

Facing the same issue!

hiyamgh avatar Aug 21 '23 11:08 hiyamgh

Is there anyone who solve this problem? I have a same issue

hooNpk avatar Oct 06 '23 04:10 hooNpk

I conjecture that you are compiling segamaker using a local environment. The reason for the issue is the absence of inference.py.

When executing pytorch_model.deploy(), it will initiate the Docker server image. Since I am using TensorFlow, it will launch the server image 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11-gpu.

However, this image lacks "inference.py", resulting in a FileNotFoundError: [Errno 2] No such file or directory: 'inference.py' error.

Solution:

Firstly, create ./code/inference.py. Refer to the following link for the code: https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/test/resources/examples/test1/inference.py

import json
from collections import namedtuple

Context = namedtuple('Context',
                     'model_name, model_version, method, rest_uri, grpc_uri, '
                     'custom_attributes, request_content_type, accept_header')


def input_handler(data, context):
    if context.request_content_type == 'application/json':
        d = data.read().decode('utf-8')
        return d if len(d) else ''

    if context.request_content_type == 'text/csv':
        return json.dumps({
            'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
        })

    raise ValueError('{{"error": "unsupported content type {}"}}'.format(
        context.request_content_type or "unknown"))


def output_handler(data, context):
    if data.status_code != 200:
        raise ValueError(data.content.decode('utf-8'))

    response_content_type = context.accept_header
    prediction = data.content
    return prediction, response_content_type

Next, deploy the model using the inference approach. The steps below demonstrate how to locally deploy the model using TensorFlow:

import os
from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(
    entry_point='inference.py',
    source_dir='./code',
    role=os.environ['AWS_ROLE'],
    model_data=f'{output}/model.tar.gz',
    framework_version='2.11'
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='local_gpu',
)
  1. The ./code/inference.py will be automatically loaded into the server image.
  2. {output}/model.tar.gz will be automatically loaded as the local model.
  3. The model.deploy() command will successfully initiate the server image.

kenny-chen avatar Jan 05 '24 06:01 kenny-chen

whats the solution?

nehyaeeg3 avatar Jan 27 '24 02:01 nehyaeeg3

I think i've figured it out ; you MUST create the app.py file WITHIN here :

file browser pane, browse to "./lab1/packages/{account_id}-lab1_code-1.0/src/

Some tutorials say here: "In the file browser pane, browse to ./lab1/packages/{account_id}-lab1_code-1.0/. Find descriptor.json." BUT the right tutorial (https://catalog.workshops.aws/panorama-immersion-day/en-US/20-lab1-object-detection/21-lab1) correctly states to create and save the app.py txt file in ""./lab1/packages/{account_id}-lab1_code-1.0/src/" i.e. the SRC folder found in Lab 1.

Wrong version - https://explore.skillbuilder.aws/learn/course/17780/play/93251/aws-panorama-building-edge-computer-vision-cv-applications Right version - https://catalog.workshops.aws/panorama-immersion-day/en-US/20-lab1-object-detection/21-lab1

kitty2121 avatar Feb 19 '24 03:02 kitty2121

Also facing this issue. I think the docs on https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#bring-your-own-model may be faulty as well?

evankozliner avatar Mar 25 '24 16:03 evankozliner

I also encountered the same problem. I tried from notebook instance. Here is my implementation:

import sagemaker
from sagemaker.pytorch import PyTorchModel
from sagemaker.serverless import ServerlessInferenceConfig

sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

model = PyTorchModel(
    entry_point='inference.py',
    role=role,
    model_data='s3://***/model.tar.gz',
    framework_version='2.1',
    py_version='py310',
)

serverless_config = ServerlessInferenceConfig(
    max_concurrency=1,
    memory_size_in_mb=3072,
)

deploy_params = {
    'instance_type': 'ml.t3.medium',
    'initial_instance_count': 1,
    'serverless_inference_config': serverless_config,
}

predictor = model.deploy(**deploy_params)

Itto1992 avatar May 14 '24 08:05 Itto1992

I solved this problem by changing how to create tar.gz file. The key point is to use tar command in the same directory as the model file like:

$ tar czvf ../model.tar.gz *
code/
code/requirements.txt
code/inference.py
model.pth

I failed when I run tar command in the parent directory of the model file:

$tar czvf model.tar.gz model
model/
model/model.pth
model/code/
model/code/requirements.txt
model/code/inference.py

As you can see, these results are different each other.

Itto1992 avatar May 14 '24 09:05 Itto1992