Lauren Yu

Results 31 comments of Lauren Yu

> @laurenyu If I have to set up things myself, then how can I use esitmator.fit() API to launch the training job? Can't the TF_CONFIG be configured with the distributions...

@anirudhacharya yep, that's exactly what I was thinking. You can also use the environment variable `SM_CURRENT_HOST` to get the host name ([docs](https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md)).

> `--docker-base-name sm-tf-horovod-integration --tag latest` what image did you use for your test run?

running ``` pytest test/integration/sagemaker/test_horovod.py --account-id 763104351884 --docker-base-name tensorflow-training --tag 1.15.0-gpu-py3 --processor gpu --dockerfile-type dlc.gpu ``` produced ``` [ip-10-0-79-182.us-west-2.compute.internal:00039] 1 more process has sent help message help-orte-odls-default.txt / memory not bound...

@diegodebrito no, it wasn't included in v2.0.0 unfortunately cc @ajaykarpur, who might know about the current status of this issue

understood. definitely agree that the SDK can do better here. I'll leave this issue open as a feature request, and hopefully we can prioritize this work in the near future....

thanks for the clarification! So it seems that we would need to find a way to expose shm-size as an option that would then get written into the docker-compose.yml file...

the reason you're seeing this behavior is because `attach()` is a class method that returns a new instance of the estimator. This is so you don't have to instantiate a...

Thanks for the feature request! We'll look into whether or not this can be put on a roadmap in the near future, and report back here.

sorry for the slow response here as well. for the three attributes you mentioned: * `entry_point` is covered in #1427, since `deploy()` just calls `create_model()` * `name` should be working...