sagemaker-python-sdk icon indicating copy to clipboard operation
sagemaker-python-sdk copied to clipboard

ModelTrainer API cannot configure the right Session

Open dgallitelli opened this issue 5 months ago • 5 comments

Describe the bug I'm trying to train a model using the sagemaker.modules.train.ModelTrainer API. However, it keeps trying to validate the SageMaker session using Pydantic, only never to accept any possible input. It spits out the Validation Error you see in the screenshot attached below.

To reproduce

  1. Write a ModelTrainer compatible script
  2. Write the following code:
model_trainer = ModelTrainer(
    training_image=image_uri, compute=compute, source_code=source_code,
    hyperparameters=hyperparameters, environment=env,
    base_job_name=job_prefix,
    stopping_condition=StoppingCondition(max_runtime_in_seconds=90000),
    checkpoint_config=CheckpointConfig(s3_uri=f"{checkpoint_s3_path}/{job_prefix}"),
) 
model_trainer.fit(...)

Expected behavior Training to start

Screenshots or logs

Image

System information A description of your system. Please provide:

  • SageMaker Python SDK version: 2.248.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
  • Framework version: N/A
  • Python version: 3.12
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context Tried to downgrade to other SageMaker SDK versions, but couldn't get to a working one.

dgallitelli avatar Jul 15 '25 13:07 dgallitelli

When using the ModelTrainer, if the sagemaker_session is not provided it will create one for you by default otherwise you can pass in like below:

from sagemaker.modules import Session
from sagemaker.modules.train import ModelTrainer

sagemaker_session = Session()

model_trainer = ModelTrainer(
   sagemaker_session=sagemaker_session,
   ...
)

If this is still causing some issue, can you try using latest version of sagemaker and sagemaker-core.

pip install -U sagemaker sagemaker-core

I can not tell exactly what may be the issue from description, but my guess is likely from outdated sagemaker-core if the sagemaker_session is being correctly initialized from the shared import path

benieric avatar Jul 15 '25 16:07 benieric

Tried that earlier today, no luck. Including when installing sageMaker-core alongside the other dependencies.

Il giorno mar 15 lug 2025 alle 19:00 Erick Benitez-Ramos < @.***> ha scritto:

benieric left a comment (aws/sagemaker-python-sdk#5237) https://github.com/aws/sagemaker-python-sdk/issues/5237#issuecomment-3074440878

When using the ModelTrainer, if the sagemaker_session is not provided it will create one for you by default otherwise you can pass in like below:

from sagemaker.modules import Session from sagemaker.modules.train import ModelTrainer

sagemaker_session = Session()

model_trainer = ModelTrainer( sagemaker_session=sagemaker_session, ... )

If this is still causing some issue, can you try using latest version of sagemaker and sagemaker-core

pip install -u sagemaker sagemaker-core

— Reply to this email directly, view it on GitHub https://github.com/aws/sagemaker-python-sdk/issues/5237#issuecomment-3074440878, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLMPMVZ7YQBV74BPPU5SUT3IUXSDAVCNFSM6AAAAACBR6WJPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANZUGQ2DAOBXHA . You are receiving this because you authored the thread.Message ID: @.***>

dgallitelli avatar Jul 15 '25 17:07 dgallitelli

Hmm, yea that is strange I would try to upgrade pydantic version as well.

Are you using custom session object or the default one created by ModelTrainer.

If it is custom, it needs to be imported from

(from sagemaker.modules import Session)

benieric avatar Jul 15 '25 19:07 benieric

Hi @dgallitelli from some testing I think this somehow is issue with latest sagemaker core. I found that pinning to an older version could get to succeed.

pip install "sagemaker-core==1.0.41"

benieric avatar Jul 15 '25 23:07 benieric

Hi @dgallitelli from some testing I think this somehow is issue with latest sagemaker core. I found that pinning to an older version could get to succeed.

pip install "sagemaker-core==1.0.41"

This works! Thank you 😄 Should this be fixed in this library, or sagemaker-core?

dgallitelli avatar Jul 16 '25 08:07 dgallitelli