Angainor Development comments

Results 70 comments of


                                            Angainor Development

anyone tried batch inference?

> the end of the output shows a sequence of "????" eg. Hello. ?? ?? ?? I had the same, using batch decoding and beam search with multiple beams. I...

runtime error: mat1 and mat2 shapes cannot be multiplied

Had the same when training on 2 gpus, using just python finetune.py Got it running on both using torchrun `WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py` Make sure your settings are...

Add option to train more modules

I'd like to see a working version of this so I can test further on!

How to load a model pre-trained on a 52k dataset and continue fine-tuning with another dataset.json?

`./lora-alpaca` contains the lora model alone. You can re-train from an existing lora instead of from scratch. Just, you'll start with a blank state optimizer. If you run the initial...

How to load a model pre-trained on a 52k dataset and continue fine-tuning with another dataset.json?

Merging is subobtimal on many sides. I was able to continue a previous training successfully without merging the weights, just going on with the previous LoRA. This works either with...

How to load a model pre-trained on a 52k dataset and continue fine-tuning with another dataset.json?

My running version is more heavily customized, but here are the minimal needed changes: https://github.com/tloen/alpaca-lora/pull/154

How to load a model pre-trained on a 52k dataset and continue fine-tuning with another dataset.json?

> Thank you! Is your loss after loading checkpoint/lora is somewhere near the loss it stopped training the la When using a full checkpoint (inc. optimizer state) the loss is...

How to load a model pre-trained on a 52k dataset and continue fine-tuning with another dataset.json?

I tried the same to make sure. Got the weights from hf, continued training from them, no issue ``` Restarting from ./lora-alpaca/alpaca-lora-7b/adapter_model.bin trainable params: 4194304 || all params: 6742609920 ||...

[WIP] Dockerfile

> I'm getting this error after running docker. Would you have more than 1 gpu by chance? Can you try exposing only one of them to Docker?

Enabling model parallelism (training 30b on 2x 3090s and beyond)

Hi, Thanks for this work! I'm experimenting multiple configs to find the best matches for my use cases. Linux, 2x3090. I'm able to train 7b and 13b on both of...