rivershah comments

Results 43 comments of


                                            rivershah

TensorFlow v2.14.0 breaks TensorFlow Probability at import

@csuter Could `tfp` specify compatible `tensorflow` version ranges to prevent such issues. Unit tests inside my container caught this issue upfront, however it would be good if pip install would...

TensorFlow 2.13 distributed training fail

Adding to distributed training hanging with `tensorflow==2.13.1` Small fashion mnist example to reproduce jit_compiled model fails to train and hangs: ``` import tensorflow as tf from keras import Model from...

Private container registry authentication

Fantastic. May I please request that we include this feature request. I have looked at the authentication documentation for gitlab, and this seems straightforward if dsub can expose the relevant...

Multi-region support google-batch

@mbookman Thanks for looking. My understanding of the docs is that `batch` api will raise an error if multiple regions. > Only one region or multiple zones in one region...

Multi-region support google-batch

@mbookman Happy new year! I am still pretty sure that `batch` as implemented on google's side, does not support submitting a job to `us` wide regions, which `google-cls-v2` does allow....

Multi-region support google-batch

@mbookman @wnojopra As the `google-cls-v2` is headed for removal soon enough, requesting that we look at this feature regression. Thank you

Multi-region support google-batch

Hi @mbookman, Apologies for the delay. The multi-region feature is crucial for several reasons: - Hardware Flexibility: Users can't predict accelerator hardware and preemptible machine availability in advance. Multi-region support...

Feature request TPU v4 support

@wnojopra Please take a look here: https://www.youtube.com/watch?v=W7A-9MYvPwI&t=301s Now tpus follow the same provisioning model as gpus. Root access to host vm with the accelerators on the host. I am not...

Unable to connect via ssh despite --ssh flag

I found the reason for the error. Firewall rules for the project were corrupted and ssh traffic was getting blocked. In case another user runs into same issue, please ensure...

steps_per_execution autotune

I would like to better understand why this parameter needs autotuning. Will appreciate if someone can help by looking over these questions: Is it sufficient to always set `steps_per_execution` to...