sagemaker-training-toolkit icon indicating copy to clipboard operation
sagemaker-training-toolkit copied to clipboard

Entry point package doesn't seem to work with nested directories

Open loodvn opened this issue 5 years ago • 18 comments

Hey there!

I'm having some trouble getting my Sagemaker Tensorflow code to work after moving my script to another directory.

Previously, I had the following directory structure:

submit_notebook.ipynb
train.py
setup.py
my_package/
  other modules

And it worked with source_dir="." and entry_point="train.py".

Now, I recently moved my training script into one of my package directories as follows:

submit_notebook.ipynb
setup.py
src/
  my_package/
    train.py
    other modules

When running estimator.fit with source_dir="." and entry_point="src/my_package/train.py", I get an ImportError: "No module named src/my_package/train".

Higher up in the logs, I spotted: "Invoking script with the following command: /usr/bin/python -m src/my_package/train <some_args>"

After starting in sagemaker-tensorflow-container, I saw that sagemaker_containers._entry_point_type has a check that if there's a "setup.py" file, the entry_point type is PYTHON_PACKAGE.

Later in sagemaker_containers._process, we take any PYTHON_PACKAGE user-given entrypoint string and remove the .py extension.

That makes sense if your entry_point is "train.py", but as mentioned above introduces weirdness when there are directories in the way.

Describing my proposed fix in the PR

loodvn avatar Feb 05 '20 11:02 loodvn

https://github.com/aws/sagemaker-containers/pull/244

loodvn avatar Feb 05 '20 11:02 loodvn

sorry for the slow response here. I had actually started working on this awhile ago: https://github.com/aws/sagemaker-python-sdk/pull/941. There is already an entrypoint in this repository that could preserve the path of the script, and as of 1-2 months ago all of the pre-built SageMaker images should be using that entrypoint, but I haven't had a chance to update my PR.

laurenyu avatar Feb 12 '20 00:02 laurenyu

Thanks @laurenyu! That's great news. Would you like me to close this issue and associated draft PR?

loodvn avatar Feb 14 '20 19:02 loodvn

we can leave this issue open since I don't think there's an existing one yet for this

laurenyu avatar Feb 17 '20 20:02 laurenyu

It appears this is still an issue in sagemaker 2.36.0 in the PyTorch Estimator, are others having the same issue?

zbloss avatar Apr 19 '21 17:04 zbloss

I'm having the same issue. So far the only solution I've found is to put theentry_point script in the root directory of source_dir. Pretty lame since that isn't where I want it...

rkechols avatar Jun 23 '21 23:06 rkechols

+1 The issue is still present, would be wonderful if you find time to work on it.

andremonaco avatar Aug 26 '21 10:08 andremonaco

Any updates on this? I am facing this issue as well.

ra2630 avatar Sep 01 '21 12:09 ra2630

+1 I'm hitting this as well!

JustASquid avatar Oct 12 '21 05:10 JustASquid

I'm also still experiencing this problem. Has there been a fix since October?

stevennovations avatar Jun 07 '22 12:06 stevennovations

+1 I'm hitting this as well!

fatemehtd avatar Jul 08 '22 21:07 fatemehtd

I'm hitting this too!

dkawashima avatar Aug 31 '22 23:08 dkawashima

Sorry for the late response and inconvenience. Our team has started looking into this and will update as soon as we can.

satishpasumarthi avatar Sep 01 '22 00:09 satishpasumarthi

+1 Encountering this issue currently.

slefcourt27 avatar Sep 05 '22 02:09 slefcourt27

Also having same issue

carlryn avatar Nov 15 '22 15:11 carlryn

Also having the same issue

jmlouw avatar Jan 27 '23 13:01 jmlouw

Same issue here.

In my Sagemaker Studio, the directory looks something like:

notebook_with_code_executing_python_file.ipybb
subDirectory1_with_many_files
subDirectory2_with_many_files
...
duplicated_text_file_from_subdirectory.txt

Within the notebook,

when I use

with open('subDirectory1_with_many_files/text_file_from_subdirectory.txt') as file:
        lines = file.readlines()
lines

I can read the contents of the file without any issues

when I use the following code to run a python file reading from a file in a sub directory

%run  ./python_file.py

I encounter a "FileNotFoundError: [Errno 2] No such file or directory: 'subDirectory1_with_many_files/text_file_from_subdirectory.txt'" error.

and this issue was resolved by reading from "duplicated_text_file_from_subdirectory.txt" instead of "subDirectory1_with_many_files/text_file_from_subdirectory.txt" within my python_file.py. Even if I can try to do this as an interim solution, it is an issue for running more complex models with datasets and model artifacts in their proper subdirectories.

KaiquanMah avatar Feb 02 '23 06:02 KaiquanMah

Hi laurenyu , oodvn What is then the final approach to define estimator once we have setup.py and we have to pass the path to entry_point. https://github.com/aws/sagemaker-containers/pull/244#issuecomment-582389214 Thank you guys in advance

MedTaherBouzid avatar Feb 10 '23 09:02 MedTaherBouzid