Yu Zhang
Yu Zhang
@tianyu-l I just wrote one for medium/small-sized models https://github.com/fla-org/flame/blob/main/convert_hf_to_dcp.py like https://github.com/pytorch/torchtitan/blob/main/scripts/convert_llama_to_dcp.py. Iām using the converted DCPs to finetune the [Qwen model](https://huggingface.co/fla-hub/transformer-3B-qwen2.5-instruct) on finweb-edu, and everything appears to be working as...
@yiyousong Hello, could you please explain more on what does this arg mean and what's the purpose of imposing this arg
@yiyousong Thank you, good point! We do need to suuport this. But I dont think `cum_k` is a good name, some better APIs designs could be considered. How about making...
@yiyousong Hello r u still working on this PR?