amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

Sagemaker processing job with BYOC, unable to start sklearn_preprocessor.transform from within container.

Open papierGaylard opened this issue 3 years ago • 0 comments

I'm currently deploying a Jupyter Notebook inside a docker container via this github: https://github.com/aws-samples/sagemaker-run-notebook

Sagemaker creates a 'processing job' which launches my docker container and runs the jupyter notebook within it via papermill.

However sagemaker Processing Jobs don't have access it seems to any internal AWS services despite operating within a VPC. This is normal behaviour and you need to create VPC Endpoints within your VPC that provides access for services within the VPC to specific AWS internal API's. Here's an example: "com.amazonaws.eu-west-2.s3" which provides access for a processing job to S3.

So, my problem is when I run my script it hangs/timesout on this cell:

transformer = sklearn_preprocessor.transformer(
    instance_count=1, 
    instance_type='ml.m4.xlarge',
    assemble_with='Line',
    accept='text/csv',
    output_path = f"s3://{BUCKET}/{PROJECT_NAME}/preprocessor/")
transformer.transform(train_inputs[0], content_type='text/csv')

I'm assuming that this .transform function is calling some internal service that I haven't built and endpoint for.

And ideas?

No log output from Papermill as the cell execution just hangs.

papierGaylard avatar May 26 '22 21:05 papierGaylard