rivershah
rivershah
@csuter Could `tfp` specify compatible `tensorflow` version ranges to prevent such issues. Unit tests inside my container caught this issue upfront, however it would be good if pip install would...
Adding to distributed training hanging with `tensorflow==2.13.1` Small fashion mnist example to reproduce jit_compiled model fails to train and hangs: ``` import tensorflow as tf from keras import Model from...
Fantastic. May I please request that we include this feature request. I have looked at the authentication documentation for gitlab, and this seems straightforward if dsub can expose the relevant...
@mbookman Thanks for looking. My understanding of the docs is that `batch` api will raise an error if multiple regions. > Only one region or multiple zones in one region...
@mbookman Happy new year! I am still pretty sure that `batch` as implemented on google's side, does not support submitting a job to `us` wide regions, which `google-cls-v2` does allow....
@mbookman @wnojopra As the `google-cls-v2` is headed for removal soon enough, requesting that we look at this feature regression. Thank you
Hi @mbookman, Apologies for the delay. The multi-region feature is crucial for several reasons: - Hardware Flexibility: Users can't predict accelerator hardware and preemptible machine availability in advance. Multi-region support...
@wnojopra Please take a look here: https://www.youtube.com/watch?v=W7A-9MYvPwI&t=301s Now tpus follow the same provisioning model as gpus. Root access to host vm with the accelerators on the host. I am not...
I found the reason for the error. Firewall rules for the project were corrupted and ssh traffic was getting blocked. In case another user runs into same issue, please ensure...
I would like to better understand why this parameter needs autotuning. Will appreciate if someone can help by looking over these questions: Is it sufficient to always set `steps_per_execution` to...