Yu Zhang

Results 94 comments of Yu Zhang

@tianyu-l I just wrote one for medium/small-sized models https://github.com/fla-org/flame/blob/main/convert_hf_to_dcp.py like https://github.com/pytorch/torchtitan/blob/main/scripts/convert_llama_to_dcp.py. I’m using the converted DCPs to finetune the [Qwen model](https://huggingface.co/fla-hub/transformer-3B-qwen2.5-instruct) on finweb-edu, and everything appears to be working as...

@yiyousong Hello, could you please explain more on what does this arg mean and what's the purpose of imposing this arg

@yiyousong Thank you, good point! We do need to suuport this. But I dont think `cum_k` is a good name, some better APIs designs could be considered. How about making...