mahaocong90
mahaocong90
Hi @quinnrong94, I have a question, Can FlashMLA be used with mtp 3-1-4? I test on H20 141g, and the benchmark reported a memory leak at the end: Scheduler hit...
> > Hi @quinnrong94, I have a question, Can FlashMLA be used with mtp 3-1-4? I test on H20 141g, and the benchmark reported a memory leak at the end:...
Same question, and then if I set --max_length from 256 to 128, and --batch_size from 24 to 12, does this reduce fine-tuning memory consumption?