sagemaker-python-sdk
sagemaker-python-sdk copied to clipboard
Cannot load local file from Windows path
Describe the bug
When I am trying to create a workflow step for deployment with a script from a local path in a Windows OS the code will fail saying ValueError: is not a valid file
Step config used:
deployment_step_config = {
'name': '...',
'...': '...',
'code': f'file://{os.path.join(path_to_script, 'deployment.py')}'
}
Example, 'code'
value looks like:
'file://C:\\Users\\user1\\....\\deployment.py'
Because of being a Windows path I get the following errors:
Screens or logs
If I remove the 'file://'
then I get the following error:
But this file actually exists, if I put this path in a browser or the file explorer
To reproduce
Create a workflow step and try to upload a local script from a Windows system.
Or more simple call the method with a Windows path as parameter and the Hash class used by default:
https://github.com/aws/sagemaker-python-sdk/blob/325214f7b2431e4b375409d6a4a99c7cd8bfa176/src/sagemaker/workflow/utilities.py#L117
Expected behavior
That the following lines of code can parse a file path for Windows.
System information
- SageMaker Python SDK version: 2.72.0
- Requirements:
-
numpy>=1.16.4
-
pandas>=1.3.0
-
boto3>=1,.17.101
-
sagemaker>=2.72.0
-
protbuf==3.17.3
Python version: 3.8 CPU or GPU: CPU Custom Docker image (Y/N): N
-
Addiotional context
The method _hash_file expects a string to start by 'file://'
but if a Windows path starts by that is wrongly recognize as a file because of the '//'
and then having '\\'
in the rest of the folders.
Even I tried ussing the preffix 'file:///...'
but then it doesn't recognize it as the path is like '\\C:\\...'
.
Solution
I would suggest to fix it by just checking in which OS this is running, for example:
https://github.com/aws/sagemaker-python-sdk/blob/325214f7b2431e4b375409d6a4a99c7cd8bfa176/src/sagemaker/workflow/utilities.py#L117
# new code
from platform import system
def _hash_file(file: Union[str, Path], md5: Hash) -> Hash:
"""Updates the inputted Hash with the contents of the current path.
Args:
file: path of the file
Returns:
str: The MD5 hash of the file
"""
if isinstance(file, str) and file.lower().startswith("file://"):
# changes
file = urlparse(file)
if 'windows' in system().lower():
file = file.netloc
else: # Linux, Mac
file = file.path
file = unquote(file)
if not Path(file).is_file():
raise ValueError(str(file) + " is not a valid file")
with open(file, "rb") as f:
while True:
data = f.read(BUF_SIZE)
if not data:
break
md5.update(data)
return md5
I think it is an already known issue with Sagemaker package. Here is the fix https://github.com/aws/sagemaker-python-sdk/pull/3051 (2.84.1)
I think it is an already known issue with Sagemaker package. Here is the fix #3051 (2.84.1)
Actually that https://github.com/aws/sagemaker-python-sdk/pull/3051 (2.84.1) doesn't solve the issue, that changes were the ones that introduce it.
I can make the changes following the contribution doc and making a PR from a forked repo, but before that I would like that someone from AWS answer to me first to check if they will approve it, and if there are any guidelines for coding style to follow or constant variables to know the OS in which is running.
Thanks on advance
Sorry for the late response.
Officially, the SageMaker Python SDK only supports POSIX compliant operating systems. For that reason we're unlikely to accept a PR that provides work-arounds for Windows -- even if the change works, all of our integration testing happens on Linux distros so we can't reliably maintain this code. You'll find that there are dozens of places in the SDK where you'll run into an issue similar to this one.
I'm curious though, have you tried changing the path you're passing? This Wiki article explains that Windows does support file URIs, but in a particular format: file:///c:/path/to/the%20file.txt
. Curious if converting your backslashes to forward-slashes (and possibly adding an additional forward-slash to the protocol prefix) would work.
Closing this issue for now. Feel free to reopen if there is any more feedback. Thanks