Tianqi Chen comments

Results 637 comments of


                                            Tianqi Chen

[Bug] REST server doesn't work on V100 (SM70) - cudaErrorNoKernelImageForDevice (but chat works)

We use a TensorIr variant of flashinfer which normally was at 80 to 90 percent of flashinfer efficiency. Note this is for decode, still need to confirm prefill

[Bug] REST server doesn't work on V100 (SM70) - cudaErrorNoKernelImageForDevice (but chat works)

The stats is still something WIP, but indeed that is a great suggestion

Security groups leave cluster open to crypto exploits

@thvasilo Thanks for reporting this, can you try to submit a patch to this repo?

Hadoop url not working anymore

hmm, we nay need to update the URL to a new one, anyone interested in push a PR on this?

[DISCUSS] Compute Shader non-buffer arguments

This is related to https://github.com/gpuweb/gpuweb/issues/75, but more on the application side rather than the implementation details. So please feel free to suggest to close this one and move to the...

[DISCUSS] Compute Shader non-buffer arguments

I see, I wonder whether that would create the problem of write after write dep if the same function is invoked multiple times consecutively. Of course if the write is...

[Question] Parallel computations using multiple streams?

this is a good question, it might be possible , however phi is a small model so the impact may not be too observable. As of now we didn't yet...

[Question] Can not get chat CLI working, throwing error after cloning model

this seems to be the download error, can you check if you have installed the git, and git-lfs properly in your env

[Question] Can not get chat CLI working, throwing error after cloning model

error message said you do not support f16, so please try to use a q4f32 variant of the model

[Question] Can not get chat CLI working, throwing error after cloning model

``` mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC ```