qwen2.5 modeling support + conversion back to hf ckpt format

Open uralik opened this issue 10 months ago • 0 comments

What does this PR do? Please describe:

adding support for qwen models that do not require tensor parallelism. All loading is done from HF safetensors and remapping of state dicts to fs2 format.
hugging face tokenizer support added. qwen model uses hf based tokenizer
qwen ckpt conversion command added to save it back into HF model.

all transformers imports are checked with try except given that transformers is not mandatory (yet)

Confirmed that this works by training SFT with 7B size, converting it back to HF and using with vllm.

Apr 12 '25 01:04 uralik