Ilia Kulikov
Ilia Kulikov
**What does this PR do?** Adds a gradio interface for mistral instruct 7b chatbot @cbalioglu minor observation is that the nucleus param change doesnt influence the output as much as...
**What does this PR do? Please describe:** - adding support for qwen models that do not require tensor parallelism. All loading is done from HF safetensors and remapping of state...
Previous online DPO fix did not consider that athene reward also mimic dummy batch although its likely not needed, but for now we copy the same logic as from math...