Lauren Yu
Lauren Yu
@nectario what is your system setup? are you on a GPU instance?
``` ERROR: for algo-1-sk9mf Cannot create container for service algo-1-sk9mf: Unknown runtime specified nvidia ``` I've usually seen this error when running a GPU image on a CPU instance, but...
thanks for bringing this to our attention! I think for Local Mode, the fix would be modifying the code around https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/local/image.py#L657 - I'll see if I can get to making...
you're right - my bad. the fix needs to happen at https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_process.py#L29, where `None` is replaced with `subprocess.STDOUT`. I'm working on the fix, and will post updates here as I...
thanks for the kind words! Unfortunately, this isn't currently supported at this time, but I'll leave this issue open as a feature request.
thanks for the suggestion!
sorry for the delayed response @iluoyi! We don't natively support multiple inputs in our default hosting functions, but you can override the defaults by providing your own `transform_fn` in your...
@ZHAO0189 sorry for the slow response. that sounds like it's likely an issue with your serving entry point code. can you share your inference script?
sorry for the delayed response here. The main use case for an S3 `model_dir` is when using TensorFlow's support for S3 checkpointing during distributed training. TF (the framework itself) does...
sorry for the delayed response here. The metrics should be viewable in CloudWatch - scroll down to the "Monitor" section in the AWS console when looking at a training job....