ananth102

Results 9 comments of ananth102

Hi sziem, are you repeatedly seeing this issue? If so can you share some sample code that we can use to replicate this.

Seems like an issue with the sdk. https://github.com/aws/sagemaker-python-sdk/blob/2d873d53f708ea570fc2e2a6974f8c3097fe9df5/src/sagemaker/experiments/_metrics.py#L200 This statement needs to reference "Code" instead of "Message". As that is what the api returns (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-metrics/client/batch_put_metrics.html) It would still error out...

@AlexandreBrown This feature is complete and available in [v1.6.1-aws-b1.0.0](https://github.com/awslabs/kubeflow-manifests/releases/tag/v1.6.1-aws-b1.0.0). https://awslabs.github.io/kubeflow-manifests/docs/add-ons/prometheus/guide/ https://awslabs.github.io/kubeflow-manifests/docs/add-ons/cloudwatch/guide/

This was done with terraform. Keeping this open in case we decided to make a cloudformation template.

Hi mwm5945, will attempt to replicate but have a couple questions: 1. Which controller verison are you using? 2. Is `arn:aws:sts:::assumed-role/sagemaker-provisioner/kiam-kiam` the ack controller role or the execution role? 3....

Which version of the Pipelines SDK are you using? And do the pods get stuck Pending state or are they not scheduled. And what are the CPU/Memory requests of the...

Update, I was able to replicate this on a simple pipeline on the v1 KFP sdk. The pods do not appear instantly when I try to list all of them...