amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

Sagemaker Processors base_job_name argument not working

Open SergioZavota opened this issue 2 years ago • 8 comments

Describe the bug Even though the base_job_name argument is set in the Processor definition, for instance sagemaker.sklearn.processing.SKLearnProcessor, the resulting processing job created has a totally different name.

To reproduce To simplify, it's possible to use the abalone pipeline example and give a custom base_job_name to the SKLearnProcessor. The result should be a ProcessingJob created with a name not compliant with the given job name, such as pipelines-kytlemm1lvpq-PreprocessingStep-cIpzShs3Qp

SergioZavota avatar Jan 24 '22 12:01 SergioZavota

Can you please link to the notebook and point out any specific lines for the jobs/variables you're referring to?

jkroll-aws avatar Jan 25 '22 16:01 jkroll-aws

Hi,

I have a similar issue with training and transform steps.

train_step = TrainingStep(
    name=f'my-train',
    estimator=Estimator(
        image_uri='...',
        base_job_name='base-train',
        instance_type='ml.m5.large',
        instance_count=1,
        volume_size=1,
        max_run=200,
        output_path='...',
        subnets=[...],
        security_group_ids=[...],
        disable_profiler=True,
        sagemaker_session=sagemaker_session,
        role=role,
    ),
    inputs={
        'training': TrainingInput(s3_data='...', content_type='application/json'),
    },
)

test = Pipeline(
    name='my-pipeline',
    steps=[train_step],
    sagemaker_session=sagemaker_session,
)
test.upsert(role_arn=role)
exec = test.start(execution_display_name='my-exec')
exec.describe()

The generate name for the training job is: pipelines-iwvdptc7f9c2-my-train-hwEq0s3KdT

and I would like something like the following: base-train-my-train-hwEq0s3KdT

jcvacaro avatar Jun 07 '22 13:06 jcvacaro

voting for this to be resolved! All our training jobs have the same name and it's impossible to tell the jobs apart.

sowston avatar Jul 07 '22 15:07 sowston

I'm having the same issue. I've just run an example notebook (https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/tabular/local-mode) and even the base_job_name parameter used in it's not affecting Sagemaker training job name or Sagemaker processing job name.

If I print out the pipeline definition, it prints the Job Name without a problem. It's something like that: Job Name: sklearn-abalone-process-2022-12-02-08-59-37-161

MrFinchh avatar Dec 02 '22 09:12 MrFinchh

The document mentions base_job_name argument - Prefix for processing job name. If not specified, the processor generates a default job name, based on the processing image name and current timestamp.

For our use-case, I don't want to create different S3 prefixes every time in the S3 bucket when the processing job runs. Currently the code gets written to <bucket_name>/<job_name>/input/code/preprocessing.py. I'm interested in using a format like <bucket_name>/processing_jobs/<job_name>/input/code/ instead of <bucket_name>/<job_name>/input/code/. I thought I could achieve this by passing the argument base_job_name, but it doesn't seem to be effective.

import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor

role = sagemaker.get_execution_role()
region = sagemaker.Session().boto_region_name
sm_client = boto3.client("sagemaker")
boto_session = boto3.Session(region_name=region)
bucket = "xxxxxxxxxxxxxxxxxx"
sagemaker_session = sagemaker.session.Session(
    boto_session=boto_session
    , sagemaker_client=sm_client
    , default_bucket=bucket
)

base_job_prefix= "sklearn-processor"

sklearn_processor = SKLearnProcessor(
    framework_version="1.0-1"
    , role=role
    , instance_type="ml.m5.xlarge"
    , instance_count=1
    , base_job_name="processing_jobs/sklearn-census-preprocess"
    , sagemaker_session = sagemaker_session
)


from sagemaker.processing import ProcessingInput, ProcessingOutput

sklearn_processor.run(
    code="preprocessing.py",
    inputs=[
        ProcessingInput(
            source=input_data
            , destination="/opt/ml/processing/input"
            , s3_input_mode="File"
            , s3_data_distribution_type="FullyReplicated"
        )
    ],
    outputs=[
        ProcessingOutput(
            output_name="train_data"
            , source="/opt/ml/processing/train"
            , destination="s3://xxxxxxxxxxxxxxx/datasets/census/train_data/"
        ),
        ProcessingOutput(
            output_name="test_data"
            , source="/opt/ml/processing/test"
            , destination="s3:/xxxxxxxxxxxxxxx/datasets/census/test_data/"
        ),
    ],
    arguments=["--train-test-split-ratio", "0.2"],
)

The Processing job is failing with -

ClientError: An error occurred (ValidationException) when calling the CreateProcessingJob operation: 1 validation error detected: Value 'processing_jobs/sklearn-census-preproce-2023-02-16-22-05-14-498' at 'processingJobName' failed to satisfy constraint: Member must satisfy regular expression pattern: ^a-zA-Z0-9{0,62}

anand086 avatar Feb 16 '23 22:02 anand086

Confirmed, this is happening for me as well. base_job_name is not honored when running SKLearnProcessor as a part of pipeline

oleg131 avatar May 02 '23 22:05 oleg131

It is happening for me as well but with the Processor

jmsanguineti avatar Aug 08 '23 17:08 jmsanguineti

Hi guys! Someone found an explanation about the SKLearnProcessor as a part of pipeline? I'm facing two problems:

  1. Getting a warning about the ProcessingJobName being popped out from the pipeline definition by default since it will be overridden at pipeline execution time. It recommends using the PipelineDefinitionConfig to persist the field in the pipeline definition if desired. I did not figure out how to set it aiming to test this. Does someone know? image

  2. I think that the previous message makes the pipeline crash at some point and I'm trying to figure out where it crashes. The main log result says that the object does not have the '_current_job_name' attribute. Bellow the entire log:

image image image image

Does anyone had such a problem? Which tips do you suggest guys? I'm learning about it and I'm lost on it. Tkx in advance for the help.

Yuri-Nassar avatar Aug 22 '23 14:08 Yuri-Nassar