Ask-Anything Instruction tuning with my own datasets

Instruction tuning with my own datasets

Open sonderzhang opened this issue 3 months ago • 4 comments

I am planning to fine-tune the VideoChat2 model with custom instruction data to enhance its performance on downstream tasks. I have a couple of questions regarding the pre-training data and the process of fine-tuning with Chinese instructions. Your insights will be highly valuable to me.

1. Pre-Training Data Language:

Was Chinese video-text data utilized in the pre-training phase of the VideoChat2 model? I've experimented with some Chinese instructions, and the model's performance was quite satisfactory. Is it advisable to perform instruction tuning on the stage 3 model using Chinese instructions? 2.Multi-GPU Fine-Tuning:

I am interested in fine-tuning the model using multiple GPUs to expedite the training process. However, I couldn't find any related arguments or settings for enabling multi-GPU training in the provided configuration file ("/scripts/config_7b_stage3.py"). Could you provide guidance or examples on how to modify the configuration for multi-GPU support?

Your assistance will greatly aid in optimizing the model for my specific requirements. Thank you in advance for your help.

Mar 08 '24 08:03 sonderzhang

Thanks for your questions!

For the Chinese QA, since we do not apply those LLMs work well for Chinese, it may be not good to directly apply it.
For the multi-GPT, please check the run.sh. We use torchrun to execute it.

Mar 08 '24 09:03 Andy1621

Thank you for your guidance! I've managed to fine-tune the model using multiple GPUs successfully. I suspect that the model's proficiency in Chinese might be attributed to the vicuna model components. Therefore, further fine-tuning this model with additional Chinese instructions could potentially enhance its performance. I'm considering exploring this to see the impact on its language handling capabilities.

Mar 08 '24 10:03 sonderzhang

May I inquire about the number of GPUs utilized during the fine-tuning process? Thank you!

Mar 08 '24 10:03 sonderzhang

For small fine-tuning data, I think 4-8 GPU > 40G is ok. However, the current codebase may be not efficient. You can follow some other repos like LAVIN to use some lightweight fine-tuning strategies like QLoRA.

Mar 08 '24 13:03 Andy1621

Ask-Anything Ask-Anything copied to clipboard

Instruction tuning with my own datasets

Ask-Anything
Ask-Anything copied to clipboard