Ansh Radhakrishnan
Ansh Radhakrishnan
@Rocamonde did you happen to get around to doing any flakiness benchmarking? I'm going to try and address some of the flaky tests over the next few days, just wanted...
``` Launching training on 8 GPUs. --------------------------------------------------------------------------- ProcessRaisedException Traceback (most recent call last) [/tmp/ipykernel_1090735/2038238995.py](https://localhost:8080/#) in 1 args = ("bf16", 42, 64) ----> 2 notebook_launcher(training_loop, args, num_processes=8) 2 frames [/opt/conda/lib/python3.7/site-packages/accelerate/launchers.py](https://localhost:8080/#) in...
Nope, still breaks strangely enough (same error and stack trace).
It's actually connected to a local runtime which consists of 8 A100s.
Sorry 8 A6000s, not A100s!
Hmm it still seems to be failing on the minimal example - I get the following stack trace: ``` ProcessRaisedException: -- Process 0 terminated with the following error: Traceback (most...
Nope, same stack trace for both of those cases.
Sounds good, thanks for your help!
https://www.runpod.io/