sagemaker-training-toolkit
sagemaker-training-toolkit copied to clipboard
Entry point package doesn't seem to work with nested directories
Hey there!
I'm having some trouble getting my Sagemaker Tensorflow code to work after moving my script to another directory.
Previously, I had the following directory structure:
submit_notebook.ipynb
train.py
setup.py
my_package/
other modules
And it worked with source_dir="." and entry_point="train.py".
Now, I recently moved my training script into one of my package directories as follows:
submit_notebook.ipynb
setup.py
src/
my_package/
train.py
other modules
When running estimator.fit with source_dir="." and entry_point="src/my_package/train.py", I get an ImportError: "No module named src/my_package/train".
Higher up in the logs, I spotted: "Invoking script with the following command: /usr/bin/python -m src/my_package/train <some_args>"
After starting in sagemaker-tensorflow-container, I saw that sagemaker_containers._entry_point_type has a check that if there's a "setup.py" file, the entry_point type is PYTHON_PACKAGE.
Later in sagemaker_containers._process, we take any PYTHON_PACKAGE user-given entrypoint string and remove the .py extension.
That makes sense if your entry_point is "train.py", but as mentioned above introduces weirdness when there are directories in the way.
Describing my proposed fix in the PR
https://github.com/aws/sagemaker-containers/pull/244
sorry for the slow response here. I had actually started working on this awhile ago: https://github.com/aws/sagemaker-python-sdk/pull/941. There is already an entrypoint in this repository that could preserve the path of the script, and as of 1-2 months ago all of the pre-built SageMaker images should be using that entrypoint, but I haven't had a chance to update my PR.
Thanks @laurenyu! That's great news. Would you like me to close this issue and associated draft PR?
we can leave this issue open since I don't think there's an existing one yet for this
It appears this is still an issue in sagemaker 2.36.0 in the PyTorch Estimator, are others having the same issue?
I'm having the same issue. So far the only solution I've found is to put theentry_point script in the root directory of source_dir. Pretty lame since that isn't where I want it...
+1 The issue is still present, would be wonderful if you find time to work on it.
Any updates on this? I am facing this issue as well.
+1 I'm hitting this as well!
I'm also still experiencing this problem. Has there been a fix since October?
+1 I'm hitting this as well!
I'm hitting this too!
Sorry for the late response and inconvenience. Our team has started looking into this and will update as soon as we can.
+1 Encountering this issue currently.
Also having same issue
Also having the same issue
Same issue here.
In my Sagemaker Studio, the directory looks something like:
notebook_with_code_executing_python_file.ipybb
subDirectory1_with_many_files
subDirectory2_with_many_files
...
duplicated_text_file_from_subdirectory.txt
Within the notebook,
when I use
with open('subDirectory1_with_many_files/text_file_from_subdirectory.txt') as file:
lines = file.readlines()
lines
I can read the contents of the file without any issues
when I use the following code to run a python file reading from a file in a sub directory
%run ./python_file.py
I encounter a "FileNotFoundError: [Errno 2] No such file or directory: 'subDirectory1_with_many_files/text_file_from_subdirectory.txt'" error.
and this issue was resolved by reading from "duplicated_text_file_from_subdirectory.txt" instead of "subDirectory1_with_many_files/text_file_from_subdirectory.txt" within my python_file.py. Even if I can try to do this as an interim solution, it is an issue for running more complex models with datasets and model artifacts in their proper subdirectories.