MMuzzammil1

Results 2 issues of MMuzzammil1

## 🐛 Bug When I compile Phi-2 (https://huggingface.co/microsoft/phi-2) with `tvm.relax.transform.FuseOps()` and `tvm.relax.transform.FuseTIR()` transformations commented out (https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/compiler_pass/pipeline.py#L128), I get better prefill and decode speeds on Cuda. ## To Reproduce - To...

bug

I was curious as to why the authors have kept the number of transformer layers to be 1 for the EAGLE draft. Also I could not find any ablation studies...