MMuzzammil1 issues

Repositories
Issues
Comments

Results 2 issues of


                                            MMuzzammil1

Phi-2 q4f16_1 runs faster when compiled without `tvm.relax.transform.FuseOps()` and `tvm.relax.transform.FuseTIR()` transformations

## 🐛 Bug When I compile Phi-2 (https://huggingface.co/microsoft/phi-2) with `tvm.relax.transform.FuseOps()` and `tvm.relax.transform.FuseTIR()` transformations commented out (https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/compiler_pass/pipeline.py#L128), I get better prefill and decode speeds on Cuda. ## To Reproduce - To...

bug

Ablation Studies Regarding the Number of Layers in Eagle Draft

I was curious as to why the authors have kept the number of transformer layers to be 1 for the EAGLE draft. Also I could not find any ablation studies...