Tianqi Chen comments

Results 637 comments of


                                            Tianqi Chen

[Bug] AMD GPU Bridge causes rcclError: unhandled cuda error

this likely is a rccl issue and not something mlc project can do about, so closing it for now. love to know if there are followup findings and we love...

mlc-llm/3rdparty/tvm/src/runtime/memory/memory_manager.cc:115: InternalError: Check failed: (offset + needed_size <= this->buffer.size) is false: storage allocation failure, attempted to allocate 2048 at offset 0 in region that is 0bytes Stack trace:

@sheepHavingPurpleLeaf do you mind try to create a python script that reproduces the error? likely you can do that through ```python from openai import OpenAI from mlc_llm.serve import PopenServer def...

[Question] Any way to get the raw token output from the model?

we are moving towarda a fully OAI compatble API, which hopefully allows some customizations in systems . You can use LM chat template which is mostly raw

[Question] P40 support (best cost effective hardware)

we have seen seevral examples working on older cards, likely we just need to turnoff flash infer, cutlass, and also follow instruction to build tvm from source

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

This should be fixed

[Bug] mlc_llm package failed once, and i cant run it again

If you see mlc llm command not found. Try to run python -m mlc_LLM usually it is due to multiple python in env

[Bug] Support multiple "system" messages in REST API

Thanks for pointing this out. I think we can certainly enhance this behavior

[Bug] Support multiple "system" messages in REST API

@bayley do you know how these multiple system prompt get interpreted into prompt specifically? Most chat template follows a system then user/assistant alternation

[Bug] Support multiple "system" messages in REST API

Right now we will implement the support via concat all system messsages

[Bug] mlc-llm not working, tvm check returns none

make sure you installed mlc_llm. on windows we recommend running through a conda env