oumi
oumi copied to clipboard
[Feature][Config] Add QwQ model configs
Feature request
32B param reasoning model developed by Alibaba's Qwen team. Would be good to add training/evaluation/inference configs.
Motivation / references
https://huggingface.co/Qwen/QwQ-32B-Preview
Your contribution
If somebody can volunteer to start this work, I can answer questions and help with testing.
Hi, I would like to work on this issue. I understand that I need to add new config files for QwQ under /configs/recipes. I have gone through the existing config files for deepseek distill models and the gpt2 model. There evaluation configs share some common attributes like model, generation and tasks but I also noticed some more attribute like gpt2 eval config has "polling_interval" and "output_dir" which were not present in others. To understand it better, I am going through the doc on evaluation config. Is there any more documentations I should go through to better understand the config structure? Thanks
Thanks for your interest @HelixY2J ! The GPT2 config actually uses a different config class, AsyncEvaluationConfig, which contains some different parameter values than the other eval configs. It has the wrong documentation, which I will fix. I'd recommend looking at the eval configs for the Llama models (ex configs/recipes/llama3_1/evaluation/8b_eval.yaml) as a good reference point. As for documentation pages, you can take a look at https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html. There are many evaluation docs in the sidebar. A useful page to understand the config structure is https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluation_config.html.
Aha noted, I will look into the Llama model configs as well as the docs you have shared. Thanks a lot
Hey @HelixY2J, QwQ just released hours ago, and since it performs very well as a reasoning model, I'm adding the configs ASAP to support training. My apologies for commandeering this bug! If you're still willing to help with this task, LMK, and I can point to remaining work that can still be done.
Hey there, sorry for the delay on my end, I took more time than expected. And yes I would like to work on any remaining tasks. Could you point me to the areas that still need work ?
It would be great if you could try running training or evaluation to confirm that they work well, and to find optimal hyperparameters for training. Reproducing their reported benchmark results with evaluation would also be interesting.