lorax
lorax copied to clipboard
FlashInfer integration + cascade inference (prefix caching)
trafficstars
See https://flashinfer.ai/2024/01/08/cascade-inference.html
https://x.com/ye_combinator/status/1754537687422497220?s=20
Another nice source about that I think this could be a high priority feature ?