Tianqi Chen

Results 637 comments of Tianqi Chen

We use a TensorIr variant of flashinfer which normally was at 80 to 90 percent of flashinfer efficiency. Note this is for decode, still need to confirm prefill

The stats is still something WIP, but indeed that is a great suggestion

@thvasilo Thanks for reporting this, can you try to submit a patch to this repo?

hmm, we nay need to update the URL to a new one, anyone interested in push a PR on this?

This is related to https://github.com/gpuweb/gpuweb/issues/75, but more on the application side rather than the implementation details. So please feel free to suggest to close this one and move to the...

I see, I wonder whether that would create the problem of write after write dep if the same function is invoked multiple times consecutively. Of course if the write is...

this is a good question, it might be possible , however phi is a small model so the impact may not be too observable. As of now we didn't yet...

this seems to be the download error, can you check if you have installed the git, and git-lfs properly in your env

error message said you do not support f16, so please try to use a q4f32 variant of the model

``` mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC ```