Thomas Johnson
                                            Thomas Johnson
                                        
                                    Also getting this frequently on 0.1.8, with Ubuntu 22.04 on Intel
Happens basically every time I fire up a ton of jobs at once, but I don't have a small test case that reproduces
I haven't tried with the standard python driver. I filed the bug a while ago, so unfortunately I'm not using the exact setup I had back then. (I'm not complaining...
Thanks, so just to be clear for the 790M model you do linear scheduling up to 5 * 2.5e-4 (the 2.5e-4 is from GPT-3 Large 760M) over the first 10%...
Thanks, I'll check out Kaggle
It seems the Kaggle notebook environment is also somewhat broken. I have no idea if this is an xla issue, a Jax issue, or something else, but here's the error:...
Yes, it's the TPU VM environment that Kaggle calls "TPU VM v3-8" Here's an example notebook: https://www.kaggle.com/code/tjohnson/notebookbf52281afd
Thanks so much for looking into this @will-cromar ! I'll switch the tensorflow versions as recommended. I noticed that the docker-python PR is failing CI, although I don't have permissions...