Nicolas Forstner
Nicolas Forstner
For the people still looking for an answer, this Dockerfile works and supports Python3. It basically boils down to disabling cuDNN to avoid the `CUDNN_STATUS_NOT_INITIALIZED` error. ``` FROM nvidia/cuda:11.6.1-devel-ubuntu20.04 #...
@rosikand It might be worth having a look at the [Mesh Transformer JAX library](https://github.com/kingoflolz/mesh-transformer-jax), more concretely [here](https://github.com/kingoflolz/mesh-transformer-jax/blob/master/mesh_transformer/transformer_shard.py#L546)
I personally use [Weights and Biases](https://wandb.ai). They have remote logging via their website and explicitly focus on parameter sweeps etc. AFAIK they also have an option to host the server...