Sean Owen comments

Results 245 comments of


                                            Sean Owen

I can train model with single gpu 3090 ?

That will probably be pretty slow, but would work for the 2.8B model I'm guessing. Follow the instructions for the V100 training in the README here

AttributeError: 'str' object has no attribute 'size'

Hm, that's working for me. Where does that error occur, this code or elsewhere?

Dolly deepspeed fine-tuning hangs with multiple GPU nodes

Works for me, but you need sufficient hardware. What hardware? what model size? Looks like you're running this yourself so not sure what other changes or differences are relevant. The...

Dolly deepspeed fine-tuning hangs with multiple GPU nodes

Yes. It looks like you're running this by hand across a cluster, which probably totally works, but I assume the issue is somewhere in that setup. The provided training script...

OutOfMemoryError: CUDA out of memory. -- train dolly v2

You are running with `python`, not `deespeed`. See `train_dolly.py` and just follow it exactly

dolly2 3b, long input truncation issue, tensor shape not match error

Please show how you are loading and applying the model. Are you passing really long input?

dolly2 3b, long input truncation issue, tensor shape not match error

I suspect it's related to this https://huggingface.co/databricks/dolly-v2-12b/blob/main/tokenizer_config.json#L5 (CC @matthayes ) - someone else noted that this should be 2048, not clear why the tuning process changed it to this 'max'...

dolly2 3b, long input truncation issue, tensor shape not match error

Probably because something is adding an EOS token. Set the limit to 2047? if you have a config fix, go for it. But yeah in the end something has to...

dolly2 3b, long input truncation issue, tensor shape not match error

You are sending too much text at once. The context window limit is 2048 tokens.

MLFlow Registry of Dolly

See the blog on mlflow 2.3, with an example: https://www.databricks.com/blog/2023/04/18/introducing-mlflow-23-enhanced-native-llm-support-and-new-features.html