amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

fix- issues 2525: update Dockerfile for dask

Open ksmin23 opened this issue 3 years ago • 16 comments

Issue #, if available:

2525

Description of changes:

I updated Dockerfile to install the latest version of dask. If the old Dockerfile is used, we can not build docker image for processing job. So, I fixed this problem with updating Dockerfile.

Testing done:

Done

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • [x] I have read the CONTRIBUTING doc and adhered to the example notebook best practices
  • [x] I have updated any necessary documentation, including READMEs
  • [x] I have tested my notebook(s) and ensured it runs end-to-end
  • [x] I have linted my notebook(s) and code using tox -e black-format,black-nb-format

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

ksmin23 avatar Feb 02 '22 06:02 ksmin23

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 18de3c50f8b5cddfecabf62f14ae29eb2fe7fff4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 06:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 18de3c50f8b5cddfecabf62f14ae29eb2fe7fff4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 06:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 18de3c50f8b5cddfecabf62f14ae29eb2fe7fff4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 06:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 18de3c50f8b5cddfecabf62f14ae29eb2fe7fff4
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 06:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 91470a50db11fe460ff0698df56018e3bb58859f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 07:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 91470a50db11fe460ff0698df56018e3bb58859f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 07:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 91470a50db11fe460ff0698df56018e3bb58859f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 07:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 91470a50db11fe460ff0698df56018e3bb58859f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 02 '22 07:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: a2e1da4ebd887b2a7d335125ba0104e43e627381
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 03 '22 05:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: a2e1da4ebd887b2a7d335125ba0104e43e627381
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 03 '22 05:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: a2e1da4ebd887b2a7d335125ba0104e43e627381
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 03 '22 05:02 sagemaker-bot

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: a2e1da4ebd887b2a7d335125ba0104e43e627381
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot avatar Feb 03 '22 05:02 sagemaker-bot

I tried to build the docker image in sagemaker and it failed due to lots of conflicting packages. How do I get your file changes ?

Hasna1994 avatar Apr 11 '22 16:04 Hasna1994

I tried to build the docker image in sagemaker and it failed due to lots of conflicting packages. How do I get your file changes ?

You can get the files from https://github.com/ksmin23/amazon-sagemaker-examples/tree/fix-sagemaker_processing-issues-2525

ksmin23 avatar Apr 13 '22 08:04 ksmin23

Thanks! I have a question. I have multiple files in S3 I'd like to preprocess and label encode it. The example is great if you have a single dataset, what if we have multiple files?

Hasna1994 avatar Apr 15 '22 04:04 Hasna1994

Thanks! I have a question. I have multiple files in S3 I'd like to preprocess and label encode it. The example is great if you have a single dataset, what if we have multiple files?

dask.DataFrame supports to process multiple files; please check this url: https://examples.dask.org/dataframes/01-data-access.html

So, if you would like to process multiple files in S3, you need to update preprocess.py to handle multiple files. I think you had better check the following part of the sample code:

%%writefile preprocess.py
from __future__ import print_function, unicode_literals
import argparse
import json
import logging
......

if __name__ == "__main__":
    ......
 
    input_data_path = "s3://{}".format(
        os.path.join(
            script_args["s3_input_bucket"],
            script_args["s3_input_key_prefix"],
            "census-income.csv",  #TODO: need to be updated for multiple files
        )
    )
......

ksmin23 avatar Apr 17 '22 11:04 ksmin23