Baptiste Jamin comments

Results 89 comments of


                                            Baptiste Jamin

trafficstars

How to fine-tune instruct mpt-7b model?

A solution was found here: https://github.com/mosaicml/llm-foundry/issues/94#issuecomment-1543394147

Finetune MPT models with local dataset

I tried doing the same, and I do agree the instructions are not clear at all.

Finetune MPT models with local dataset

There might be an issue then. Here is the config I am using: ``` train_loader: name: finetuning dataset: hf_name: json hf_kwargs: data_files: train: /mnt/training/mylocaldataset/train.jsonl preprocessing_fn: mylocaldataset.utils:prep_fn split: train max_seq_len: ${max_seq_len}...

Finetune MPT models with local dataset

Issue found! It seems there are a couple of bugs in the HuggingFace library. First, due to a regex problem the system is mixing the `train.jsonl` and the split `train`....

Finetune MPT models with local dataset

Yes, keep the spit. I strongly recommend using data_dir rather than data_files. Keep the same config, but replace data_files with data_dir

Finetune MPT models with local dataset

The global batch size is too high. try with a batch size of 1, and then increase it until OOM

Finetune MPT models with local dataset

You should make a receive that is as easy as possible. Something that can be repeated by newbies. For instance a config with a specific type of GPU with the...

Strange output of Flan-T5-Large when using float16 converted model

Hey there! I discovered some similar quirks evaluating ctranslate2 with flan-t5. I compared the outputs of flan-t5-xl and flan-t5-xxl on GPU using float32, int8_float16, float16, and int8 The results for...

Strange output of Flan-T5-Large when using float16 converted model

Update: After investigating, it seems a part of the answer is there: https://github.com/huggingface/transformers/pull/22095/commits

Strange output of Flan-T5-Large when using float16 converted model

We are running the tests as we speak on a FlanT5 XXL. I seems there is the same problem in fp8. I will try with fp16. Maybe related to this...