ModelTrainer API cannot configure the right Session
Describe the bug
I'm trying to train a model using the sagemaker.modules.train.ModelTrainer API. However, it keeps trying to validate the SageMaker session using Pydantic, only never to accept any possible input. It spits out the Validation Error you see in the screenshot attached below.
To reproduce
- Write a ModelTrainer compatible script
- Write the following code:
model_trainer = ModelTrainer(
training_image=image_uri, compute=compute, source_code=source_code,
hyperparameters=hyperparameters, environment=env,
base_job_name=job_prefix,
stopping_condition=StoppingCondition(max_runtime_in_seconds=90000),
checkpoint_config=CheckpointConfig(s3_uri=f"{checkpoint_s3_path}/{job_prefix}"),
)
model_trainer.fit(...)
Expected behavior Training to start
Screenshots or logs
System information A description of your system. Please provide:
- SageMaker Python SDK version: 2.248.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
- Framework version: N/A
- Python version: 3.12
- CPU or GPU: GPU
- Custom Docker image (Y/N): N
Additional context Tried to downgrade to other SageMaker SDK versions, but couldn't get to a working one.
When using the ModelTrainer, if the sagemaker_session is not provided it will create one for you by default otherwise you can pass in like below:
from sagemaker.modules import Session
from sagemaker.modules.train import ModelTrainer
sagemaker_session = Session()
model_trainer = ModelTrainer(
sagemaker_session=sagemaker_session,
...
)
If this is still causing some issue, can you try using latest version of sagemaker and sagemaker-core.
pip install -U sagemaker sagemaker-core
I can not tell exactly what may be the issue from description, but my guess is likely from outdated sagemaker-core if the sagemaker_session is being correctly initialized from the shared import path
Tried that earlier today, no luck. Including when installing
sageMaker-core alongside the other dependencies.
Il giorno mar 15 lug 2025 alle 19:00 Erick Benitez-Ramos < @.***> ha scritto:
benieric left a comment (aws/sagemaker-python-sdk#5237) https://github.com/aws/sagemaker-python-sdk/issues/5237#issuecomment-3074440878
When using the ModelTrainer, if the sagemaker_session is not provided it will create one for you by default otherwise you can pass in like below:
from sagemaker.modules import Session from sagemaker.modules.train import ModelTrainer
sagemaker_session = Session()
model_trainer = ModelTrainer( sagemaker_session=sagemaker_session, ... )
If this is still causing some issue, can you try using latest version of sagemaker and sagemaker-core
pip install -u sagemaker sagemaker-core
— Reply to this email directly, view it on GitHub https://github.com/aws/sagemaker-python-sdk/issues/5237#issuecomment-3074440878, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLMPMVZ7YQBV74BPPU5SUT3IUXSDAVCNFSM6AAAAACBR6WJPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANZUGQ2DAOBXHA . You are receiving this because you authored the thread.Message ID: @.***>
Hmm, yea that is strange I would try to upgrade pydantic version as well.
Are you using custom session object or the default one created by ModelTrainer.
If it is custom, it needs to be imported from
(from sagemaker.modules import Session)
Hi @dgallitelli from some testing I think this somehow is issue with latest sagemaker core. I found that pinning to an older version could get to succeed.
pip install "sagemaker-core==1.0.41"
Hi @dgallitelli from some testing I think this somehow is issue with latest sagemaker core. I found that pinning to an older version could get to succeed.
pip install "sagemaker-core==1.0.41"
This works! Thank you 😄 Should this be fixed in this library, or sagemaker-core?