Sean Owen
Sean Owen
You can't use bf16 on the V100. Did you make the change in the README? https://github.com/databrickslabs/dolly#v100-gpus
Yeah I figured, just triple checking
You're saying downgrading didn't help? if not, does 0.8.0 work? If it does, then I should update the requirements.txt for now
OK, good info. Let me back off the requirements.txt to 0.8.3 for now
OK, if deepspeed 0.8.3 seems to resolve this, then that's done: https://github.com/databrickslabs/dolly/pull/130
Should take a few seconds. How are you generating? did you see https://github.com/databrickslabs/dolly#generating-on-other-instances for example?
Hm, shouldn't be any real difference there. Are you sure the settings are fairly equivalent and output length is the same (not just max)
I just mean, how much output are you getting from each? the run time is proportional to the output size. You can't directly control it, but affects the comparison. I'm...
Oh yeah, you don't want to measure time to download or load the model here. Make sure it's already loaded then time the generation
@matthayes I think this is a good point - the pythia models have use_cache=True. https://huggingface.co/databricks/dolly-v2-3b/blob/main/config.json#L29 I don't know a lot about this but seems like we would want to do...