rivershah

Results 43 comments of rivershah

@csuter Could `tfp` specify compatible `tensorflow` version ranges to prevent such issues. Unit tests inside my container caught this issue upfront, however it would be good if pip install would...

Adding to distributed training hanging with `tensorflow==2.13.1` Small fashion mnist example to reproduce jit_compiled model fails to train and hangs: ``` import tensorflow as tf from keras import Model from...

Fantastic. May I please request that we include this feature request. I have looked at the authentication documentation for gitlab, and this seems straightforward if dsub can expose the relevant...

@mbookman Thanks for looking. My understanding of the docs is that `batch` api will raise an error if multiple regions. > Only one region or multiple zones in one region...

@mbookman Happy new year! I am still pretty sure that `batch` as implemented on google's side, does not support submitting a job to `us` wide regions, which `google-cls-v2` does allow....

@mbookman @wnojopra As the `google-cls-v2` is headed for removal soon enough, requesting that we look at this feature regression. Thank you

Hi @mbookman, Apologies for the delay. The multi-region feature is crucial for several reasons: - Hardware Flexibility: Users can't predict accelerator hardware and preemptible machine availability in advance. Multi-region support...

@wnojopra Please take a look here: https://www.youtube.com/watch?v=W7A-9MYvPwI&t=301s Now tpus follow the same provisioning model as gpus. Root access to host vm with the accelerators on the host. I am not...

I found the reason for the error. Firewall rules for the project were corrupted and ssh traffic was getting blocked. In case another user runs into same issue, please ensure...

I would like to better understand why this parameter needs autotuning. Will appreciate if someone can help by looking over these questions: Is it sufficient to always set `steps_per_execution` to...