Alif Munim

Results 8 comments of Alif Munim

**Error 1 (Full):** ``` Exception in device=TPU:0: tensorflow/compiler/xla/xla_client/xrt_local_service.cc:56 : Check failed: tensorflow::NewServer(server_def, &server_) == ::tensorflow::Status::OK() (PERMISSION_DENIED: open(/dev/accel0): Operation not permitted: Operation not permitted; Couldn't open device: /dev/accel0; Unable to create...

**Error 2 (Full):** ``` Exception in device=TPU:0: Cannot replicate if number of devices (1) is different from 8Exception in device=TPU:1: Cannot replicate if number of devices (1) is different from...

Found some additional information on the pytorch lightning docs [section on TPUs](https://pytorch-lightning.readthedocs.io/en/latest/accelerators/tpu_faq.html#how-to-resolve-the-replication-issue), which mentions that you should not call `xm.xla_device()` outside of the spawn process. I've removed that line, and...

**Error 3 (Full):** ``` Exception in device=TPU:1: tensorflow/compiler/xla/xla_client/mesh_service.cc:329 : Check failed: impl_->channel->WaitForConnected( std::chrono::system_clock::now() + std::chrono::seconds(connect_wait_seconds)) *** Begin stack trace *** tensorflow::CurrentStackTrace() xla::service::MeshClient::MeshClient(std::string const&) xla::service::MeshClient::Get() xla::ComputationClient::Create() xla::ComputationClient::Get() PyCFunction_Call _PyObject_MakeTpCall _PyEval_EvalFrameDefault _PyFunction_Vectorcall...

**Error 4 (Full):** ``` 2022-08-29 20:10:46.436423: W tensorflow/core/distributed_runtime/rpc/grpc_remote_master.cc:157] RPC failed with status = "UNAVAILABLE: Socket closed" and grpc_error_string = "{"created":"@1661803846.436277160","description":"Error received from peer ipv4:127.0.0.1:51011","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket closed","grpc_status":14}", maybe retrying the RPC ```...

Sure! I think @MarkCoatsworth could provide some more information about our setup

Hey! I was having the same issue for a while. Try wrapping your imagen with the `ImagenTrainer` module (mentioned in the `README.md`) and using `trainer.train_step()` for your gradient updates. ```python...

@axel578 if you use the built-in t5 text encoding functions, you should get the correct dimensionality for your text embeddings. See https://github.com/lucidrains/imagen-pytorch#L26