Sean Owen
Sean Owen
To be totally complete, you can enable bf16 in the deepspeed config, and disable fp16. Don't leave them at "auto" or whatever you have set. But setting to 'auto' and...
@bingjie3216 yeah that looks like a problem with `Trainer` or some issue with the files on your local FS. It can't delete something it wants to delete. Permissions?
Here are some notes on getting training working on A10 and V100 GPUs: https://github.com/databrickslabs/dolly/pull/30/files
Yes, you need to run those commands. They're largely just shell commands, and you can skip (for example) the tensorboard integration. @matthayes I wonder if we could replace the .py...
This code does not use a cluster, single machine with multiple GPUs
I'm sure everyone finds that just fine - it's open source, it's just an example you can tweak, there is no "internal Databricks format", it's just .py or .ipynb files....
Yes. You may just need to adapt the code in the .py file a little bit as it's notebook code, which can execute things like shell commands too. Pull out...
I hit this too, but not sure if it was for the same reason. I have different input, and didn't format it exactly like the alpaca dataset. In particular, it...
I _believe_ this is resolved in Matt's changes from a few days ago anyway.
Unofficial comment - generally 'yes' but the real premise here is that you can achieve something near state-of-the-art performance of models of that size with a much smaller model. Using...