Thomas Johnson
Thomas Johnson
Also getting this frequently on 0.1.8, with Ubuntu 22.04 on Intel
Happens basically every time I fire up a ton of jobs at once, but I don't have a small test case that reproduces
I haven't tried with the standard python driver. I filed the bug a while ago, so unfortunately I'm not using the exact setup I had back then. (I'm not complaining...
Thanks, so just to be clear for the 790M model you do linear scheduling up to 5 * 2.5e-4 (the 2.5e-4 is from GPT-3 Large 760M) over the first 10%...
Thanks, I'll check out Kaggle
It seems the Kaggle notebook environment is also somewhat broken. I have no idea if this is an xla issue, a Jax issue, or something else, but here's the error:...
Yes, it's the TPU VM environment that Kaggle calls "TPU VM v3-8" Here's an example notebook: https://www.kaggle.com/code/tjohnson/notebookbf52281afd
Thanks so much for looking into this @will-cromar ! I'll switch the tensorflow versions as recommended. I noticed that the docker-python PR is failing CI, although I don't have permissions...
Even with `--debug --verbose` no additional info is printed. ``` > Let's make it static ⎿ API Error (Request timed out.) · Retrying in 1 seconds… (attempt 1/10) ⎿ API...
Using the env vars instead of the command line flags added a ton of information and I was able to catch it here: ``` [log_ef83ec, request-id: "req_011CPmw17sJZnh25C4hXjXfu"] post https://api.anthropic.com/v1/messages?beta=true succeeded...