sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

Cannot load local file from Windows path

Open JoseJuan98 opened this issue 2 years ago • 3 comments

Describe the bug

When I am trying to create a workflow step for deployment with a script from a local path in a Windows OS the code will fail saying ValueError: is not a valid file

Step config used:

  deployment_step_config = {
                     'name': '...',
                     '...': '...',
                     'code': f'file://{os.path.join(path_to_script, 'deployment.py')}'
                    }

Example, 'code' value looks like:

'file://C:\\Users\\user1\\....\\deployment.py'

Because of being a Windows path I get the following errors:

Screens or logs

bug_workflow_step

If I remove the 'file://' then I get the following error:

bug_workflow_step_without_literal_file

But this file actually exists, if I put this path in a browser or the file explorer

To reproduce

Create a workflow step and try to upload a local script from a Windows system.

Or more simple call the method with a Windows path as parameter and the Hash class used by default:

https://github.com/aws/sagemaker-python-sdk/blob/325214f7b2431e4b375409d6a4a99c7cd8bfa176/src/sagemaker/workflow/utilities.py#L117

Expected behavior

That the following lines of code can parse a file path for Windows.

System information

  • SageMaker Python SDK version: 2.72.0
  • Requirements:
    • numpy>=1.16.4

    • pandas>=1.3.0

    • boto3>=1,.17.101

    • sagemaker>=2.72.0

    • protbuf==3.17.3

      Python version: 3.8 CPU or GPU: CPU Custom Docker image (Y/N): N

Addiotional context

The method _hash_file expects a string to start by 'file://' but if a Windows path starts by that is wrongly recognize as a file because of the '//' and then having '\\' in the rest of the folders.

Even I tried ussing the preffix 'file:///...' but then it doesn't recognize it as the path is like '\\C:\\...'.

Solution

I would suggest to fix it by just checking in which OS this is running, for example:

https://github.com/aws/sagemaker-python-sdk/blob/325214f7b2431e4b375409d6a4a99c7cd8bfa176/src/sagemaker/workflow/utilities.py#L117


# new code
from platform import system

def _hash_file(file: Union[str, Path], md5: Hash) -> Hash:
    """Updates the inputted Hash with the contents of the current path.
    Args:
        file: path of the file
    Returns:
        str: The MD5 hash of the file
    """
    if isinstance(file, str) and file.lower().startswith("file://"):
         
         # changes
         file = urlparse(file)
         if 'windows' in system().lower():
             file = file.netloc
         else: # Linux, Mac
            file = file.path
       
        file = unquote(file)
        
    if not Path(file).is_file():
        raise ValueError(str(file) + " is not a valid file")
    with open(file, "rb") as f:
        while True:
            data = f.read(BUF_SIZE)
            if not data:
                break
            md5.update(data)
    return md5

JoseJuan98 avatar Jun 15 '22 15:06 JoseJuan98

I think it is an already known issue with Sagemaker package. Here is the fix https://github.com/aws/sagemaker-python-sdk/pull/3051 (2.84.1)

100318091 avatar Jun 16 '22 09:06 100318091

I think it is an already known issue with Sagemaker package. Here is the fix #3051 (2.84.1)

Actually that https://github.com/aws/sagemaker-python-sdk/pull/3051 (2.84.1) doesn't solve the issue, that changes were the ones that introduce it.

JoseJuan98 avatar Jun 16 '22 11:06 JoseJuan98

I can make the changes following the contribution doc and making a PR from a forked repo, but before that I would like that someone from AWS answer to me first to check if they will approve it, and if there are any guidelines for coding style to follow or constant variables to know the OS in which is running.

Thanks on advance

JoseJuan98 avatar Jun 16 '22 11:06 JoseJuan98

Sorry for the late response.

Officially, the SageMaker Python SDK only supports POSIX compliant operating systems. For that reason we're unlikely to accept a PR that provides work-arounds for Windows -- even if the change works, all of our integration testing happens on Linux distros so we can't reliably maintain this code. You'll find that there are dozens of places in the SDK where you'll run into an issue similar to this one.

I'm curious though, have you tried changing the path you're passing? This Wiki article explains that Windows does support file URIs, but in a particular format: file:///c:/path/to/the%20file.txt. Curious if converting your backslashes to forward-slashes (and possibly adding an additional forward-slash to the protocol prefix) would work.

staubhp avatar Feb 17 '23 16:02 staubhp

Closing this issue for now. Feel free to reopen if there is any more feedback. Thanks

akrishna1995 avatar Dec 28 '23 22:12 akrishna1995