Tianqi Chen
Tianqi Chen
i think it makes sense to disable and deprecate nnpack given it is more stale,
@ayush-out the latest SDK should support phi3 already and qwen
it should be the first device by default as intended, as the number of arguments and number of devices can differ
This is a good point, in theory we should be able to remove tihs logic even, for input, and require the input to be already on the right device, perhaps...
after reading the comments so far on host/device function info split and the compiler phases: - S0: In the beginning(before SplitHostDevice), we don't distinguish host/device function, a function can contain...
Seems was due to the fact that data was not completely downloaded. Please consider uninstall reinstall and redownload the weight
This is because the code intrinsic not dispatch not being registered for this op
Thanks for the suggestion. https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/interface/chat.py#L173 contains some examples of chat session. The intention is for the API to be fully OAI compatible. We did have kv cache cahing internally via...
Indeed you are right that it is important to be able to handle cached prompt prefix so we don't have to recompute these kv again. In our case, let us...
Indeed, having ability to feed in kv is useful. Because right now we design for concurrent access(from multiple users), so doing so in general is harder. We will also think...