Sean Owen comments

Results 245 comments of


                                            Sean Owen

Running the train_dolly.py with `transformers[torch]==4.28.1`, `deepspeed==0.9.1` and V100 GPU gives error in defragment

You can't use bf16 on the V100. Did you make the change in the README? https://github.com/databrickslabs/dolly#v100-gpus

Running the train_dolly.py with `transformers[torch]==4.28.1`, `deepspeed==0.9.1` and V100 GPU gives error in defragment

Yeah I figured, just triple checking

Running the train_dolly.py with `transformers[torch]==4.28.1`, `deepspeed==0.9.1` and V100 GPU gives error in defragment

You're saying downgrading didn't help? if not, does 0.8.0 work? If it does, then I should update the requirements.txt for now

Running the train_dolly.py with `transformers[torch]==4.28.1`, `deepspeed==0.9.1` and V100 GPU gives error in defragment

OK, good info. Let me back off the requirements.txt to 0.8.3 for now

Running the train_dolly.py with `transformers[torch]==4.28.1`, `deepspeed==0.9.1` and V100 GPU gives error in defragment

OK, if deepspeed 0.8.3 seems to resolve this, then that's done: https://github.com/databrickslabs/dolly/pull/130

Use use_cache=True config?

Should take a few seconds. How are you generating? did you see https://github.com/databrickslabs/dolly#generating-on-other-instances for example?

Use use_cache=True config?

Hm, shouldn't be any real difference there. Are you sure the settings are fairly equivalent and output length is the same (not just max)

Use use_cache=True config?

I just mean, how much output are you getting from each? the run time is proportional to the output size. You can't directly control it, but affects the comparison. I'm...

Use use_cache=True config?

Oh yeah, you don't want to measure time to download or load the model here. Make sure it's already loaded then time the generation

@matthayes I think this is a good point - the pythia models have use_cache=True. https://huggingface.co/databricks/dolly-v2-3b/blob/main/config.json#L29 I don't know a lot about this but seems like we would want to do...