vinkle

Results 4 issues of vinkle

Support local models by using argument `--local`, like: ``` bash python3 -m flexgen.flex_opt --model /home/username/model/facebook/opt-1.3b --local ``` It will load the locally downloaded opt-1.3b model instead of the opt-1.3b from...

bring a new speculative decoding framework, support one & multi steps mtp&eagle ### Worklist - [x] ut - [x] smoke test - [x] metrics

Due to the high CPU overhead in the existing speculative decoding framework, we are developing a brand new framework that significantly reduces CPU consumption and minimizes device-to-host synchronization. ### Worklist...

enhancement