Sean Owen comments

Results 245 comments of


                                            Sean Owen

OOM issue when finetune with V100

To be totally complete, you can enable bf16 in the deepspeed config, and disable fp16. Don't leave them at "auto" or whatever you have set. But setting to 'auto' and...

OOM issue when finetune with V100

@bingjie3216 yeah that looks like a problem with `Trainer` or some issue with the files on your local FS. It can't delete something it wants to delete. Permissions?

OOM issue when finetune with V100

Here are some notes on getting training working on A10 and V100 GPUs: https://github.com/databrickslabs/dolly/pull/30/files

Running the code without databricks

Yes, you need to run those commands. They're largely just shell commands, and you can skip (for example) the tensorboard integration. @matthayes I wonder if we could replace the .py...

Running the code without databricks

This code does not use a cluster, single machine with multiple GPUs

Running the code without databricks

I'm sure everyone finds that just fine - it's open source, it's just an example you can tweak, there is no "internal Databricks format", it's just .py or .ipynb files....

Running the code without databricks

Yes. You may just need to adapt the code in the .py file a little bit as it's notebook code, which can execute things like shell commands too. Pull out...

RuntimeError: Could not find response key token IDs when using bloom model and tokenizer to train

I hit this too, but not sure if it was for the same reason. I have different input, and didn't format it exactly like the alpaca dataset. In particular, it...

RuntimeError: Could not find response key token IDs when using bloom model and tokenizer to train

I _believe_ this is resolved in Matt's changes from a few days ago anyway.

Using Bigscience Bloom 176B or Bloomz 176B instead of GPT-J 6B

Unofficial comment - generally 'yes' but the real premise here is that you can achieve something near state-of-the-art performance of models of that size with a much smaller model. Using...